Rui Qian (钱锐)

I am a Ph.D. candidate in Multi-Media Lab at The Chinese University of Hong Kong, supervised by Prof. Dahua Lin.

I got my bachelor's degree from School of Electronic Information and Electrical Engineering at Shanghai Jiao Tong University in 2021, supervised by Prof. Weiyao Lin. During my undergraduate, I also interned at Sensetime OpenMMLab group, supervised by Dr. Kai Chen. And I am also honored to work with Prof. Di Hu.

I am interested in computer vision and machine learning, especially self-supervised learning, video understanding and multi-modal large language models.

Email  /  CV  /  Google Scholar /  Github

profile photo
News

[2024-07] Two papers accepted to ECCV 2024.

[2023-07] Two papers accepted to ICCV 2023.

[2023-03] One paper accepted to CVPR 2023.

[2022-07] One paper accepted to ECCV 2022.

[2022-06] One paper accepted to ACM MM 2022.

[2022-03] Two papers accepted to CVPR 2022.

Preprints
Streaming Long Video Understanding with Large Language Models
Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Shuangrui Ding, Dahua Lin, Jiaqi Wang
preprint, 2024
arXiv

Long video understanding with disentangled streaming video encoding and LLM reasoning.

Publications
Rethinking Image-to-Video Adaptation: An Object- centric Perspective
Rui Qian, Shuangrui Ding, Dahua Lin
ECCV, 2024

Efficiently adapt image foundation models to video domain in an object-centric manner.

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Shuangrui Ding*, Rui Qian*, Haohang Xu, Dahua Lin, Hongkai Xiong
ECCV, 2024
arXiv, code

Learn robust spatio-temporal corrependence on top of DINO-pretrained Transformer without any annotation.

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
ICCV, 2023
arXiv, pdf, code

Jointly utilizes high-level semantics and low-level temporal correspondence for object-centric learning in videos without any supervision.

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian
ICCV, 2023
Project page, arXiv, pdf, code

Propose token pruning strategy for video Transformers to offer a competitive speed-accuracy trade-off without additional training or parameters.

Static and Dynamic Concepts for Self-supervised Video Representation Learning
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin
ECCV, 2022
arXiv, code

Learn static and dynamic visual concepts in videos to aggregate local patterns with similar semantics to boost unsupervised video representation.

Dual Contrastive Learning for Spatio-temporal Representation
Shuangrui Ding, Rui Qian, Hongkai Xiong
ACM MM, 2022
arXiv, code

A novel dual contrastive formulation is presented to decouple the static/dynamic features and thus mitigate the background bias.

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Shuangrui Ding, Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong
CVPR, 2022
Project page, arXiv, code, Chinese coverage

Mitigate the background bias in self-supervised video representation learning via copy-pasting the foreground onto the other backgrounds.

Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu*, Rui Qian*, Hang Zhou*, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou
AAAI, 2022
arXiv

Erase the interference in general multi-modal scenes for robust visual sound localization.

TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition
Shuyuan Li*, Huabin Liu*, Rui Qian, Yuxi Li, John See, Mengjuan Fei, Xiaoyuan Yu, Weiyao Lin
AAAI, 2022
arXiv, code

Solve action duration misalignment and action evolution misalignment in few-shot settings.

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu, Dian Li, Weiyao Lin
ICCV, 2021
arXiv, code

Self-supervised video representation learning from the perspective of both high-level semantics and lower-level characteristics

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu, Rui Qian, Minyue Jiang, Tan Xiao, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou
NeurIPS, 2020
arXiv, code

Discriminative sounding objects localization in cocktail-party scenario in a two-stage manner

Multiple Sound Sources Localization from Coarse to Fine
Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin
ECCV, 2020
arXiv, code

Complex audiovisual scene understanding, to associate sound-object pairs from coarse to fine

Finding Action Tubes with a Sparse-to-Dense Framework
Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Limin Wang, Shugong Xu
AAAI, 2020

Spatio-temporal action localization, to localize 3D action tubes in temporal and spatial domain

Awards

Hong Kong PhD Fellowship Scheme. 2021

Outstanding Graduate of Shanghai. 2021

Top 1% Bachelor Thesis Award of Shanghai Jiao Tong University. 2021

Sensetime Scholarship. 2020

Finalist winner of MCM. 2019

Nation Scholarship, Ministry of Education of China. 2018

Professional Services

  • Reviewer: ICLR'22/23/24, CVPR'22/23/24, ECCV'22/24, ICCV'23, NeurIPS'22/23/24, ICML'23/24, AAAI'23/24.
  • Misc

    1. Great honor to be awarded the bachelor's degree from Shanghai Jiao Tong University, really memorable four years and great thanks for the friends, colleagues and professors.

    2. I am fond of Formula 1 and a real Tifosi. Excited to see Ferrari coming back at the beginning of this season and enjoy the battle between Charles Leclerc and Max Verstappen. However, Ferrari never disappoints fans in terms of letting fans disappointed as usual.

    3. Here is my best friend Shuangrui, who is really talented and interesting. More than ten years' friendship, great company, marvellous collaboration, memorable encouragement and indispensible shared enjoyment.



    Updated at Jun. 2024
    Thanks Jon Barron for this amazing template.