Rui Qian (钱锐)
I am a Ph.D. candidate in Multi-Media Lab at The Chinese University of Hong Kong, supervised by Prof. Dahua Lin.
I got my bachelor's degree from School of Electronic Information and Electrical Engineering at Shanghai Jiao Tong University in 2021, supervised by Prof. Weiyao Lin.
During my undergraduate, I also interned at Sensetime OpenMMLab group, supervised by Dr. Kai Chen.
And I am also honored to work with Prof. Di Hu.
I am interested in computer vision and machine learning, especially self-supervised learning, video understanding and multi-modal large language models.
Email  / 
CV  / 
Google Scholar / 
Github
|
|
|
Rethinking Image-to-Video Adaptation: An Object- centric Perspective
Rui Qian,
Shuangrui Ding,
Dahua Lin
ECCV, 2024
Efficiently adapt image foundation models to video domain in an object-centric manner.
|
|
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Shuangrui Ding*,
Rui Qian*,
Haohang Xu,
Dahua Lin,
Hongkai Xiong
ECCV, 2024
arXiv,
code
Learn robust spatio-temporal corrependence on top of DINO-pretrained Transformer without any annotation.
|
|
Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
ICCV, 2023
arXiv,
pdf,
code
Jointly utilizes high-level semantics and low-level temporal correspondence for object-centric learning in videos without any supervision.
|
|
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding,
Peisen Zhao,
Xiaopeng Zhang,
Rui Qian,
Hongkai Xiong,
Qi Tian
ICCV, 2023
Project page,
arXiv,
pdf,
code
Propose token pruning strategy for video Transformers to offer a competitive speed-accuracy trade-off without additional training or parameters.
|
|
Static and Dynamic Concepts for Self-supervised Video Representation Learning
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
ECCV, 2022
arXiv,
code
Learn static and dynamic visual concepts in videos to aggregate local patterns with similar semantics to boost unsupervised video representation.
|
|
Dual Contrastive Learning for Spatio-temporal Representation
Shuangrui Ding, Rui Qian, Hongkai Xiong
ACM MM, 2022
arXiv,
code
A novel dual contrastive formulation is presented to decouple the static/dynamic features and thus mitigate the background bias.
|
|
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding,
Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong
CVPR, 2022
Project page,
arXiv,
code,
Chinese coverage
Mitigate the background bias in self-supervised video representation learning via copy-pasting the foreground onto the other backgrounds.
|
|
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu*, Rui Qian*, Hang Zhou*, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou
AAAI, 2022
arXiv
Erase the interference in general multi-modal scenes for robust visual sound localization.
|
|
TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
Shuyuan Li*, Huabin Liu*, Rui Qian, Yuxi Li, John See, Mengjuan Fei, Xiaoyuan Yu, Weiyao Lin
AAAI, 2022
arXiv,
code
Solve action duration misalignment and action evolution misalignment in few-shot settings.
|
|
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Rui Qian,
Yuxi Li,
Huabin Liu,
John See,
Shuangrui Ding,
Xian Liu,
Dian Li,
Weiyao Lin
ICCV, 2021
arXiv,
code
Self-supervised video representation learning from the perspective of both high-level semantics and lower-level characteristics
|
|
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu,
Rui Qian,
Minyue Jiang,
Tan Xiao,
Shilei Wen,
Errui Ding,
Weiyao Lin,
Dejing Dou
NeurIPS, 2020
arXiv,
code
Discriminative sounding objects localization in cocktail-party scenario in a two-stage manner
|
|
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian,
Di Hu,
Heinrich Dinkel,
Mengyue Wu,
Ning Xu,
Weiyao Lin
ECCV, 2020
arXiv,
code
Complex audiovisual scene understanding, to associate sound-object pairs from coarse to fine
|
|
Finding Action Tubes with a Sparse-to-Dense Framework
Yuxi Li,
Weiyao Lin,
Tao Wang,
John See,
Rui Qian,
Ning Xu,
Limin Wang,
Shugong Xu
AAAI, 2020
Spatio-temporal action localization, to localize 3D action tubes in temporal and spatial domain
|
Awards
Hong Kong PhD Fellowship Scheme. 2021
Outstanding Graduate of Shanghai. 2021
Top 1% Bachelor Thesis Award of Shanghai Jiao Tong University. 2021
Sensetime Scholarship. 2020
Finalist winner of MCM. 2019
Nation Scholarship, Ministry of Education of China. 2018
|
Professional Services
Reviewer: ICLR'22/23/24, CVPR'22/23/24, ECCV'22/24, ICCV'23, NeurIPS'22/23/24, ICML'23/24, AAAI'23/24.
|
Misc
1. Great honor to be awarded the bachelor's degree from Shanghai Jiao Tong University, really memorable four years and great thanks for the friends, colleagues and professors.
2. I am fond of Formula 1 and a real Tifosi. Excited to see Ferrari coming back at the beginning of this season and enjoy the battle between Charles Leclerc and Max Verstappen. However, Ferrari never disappoints fans in terms of letting fans disappointed as usual.
3. Here is my best friend Shuangrui, who is really talented and interesting. More than ten years' friendship, great company, marvellous collaboration, memorable encouragement and indispensible shared enjoyment.
|
|