Rui Qian

Rui Qian (钱锐)

I am a researcher at ByteDance Seed. I obtained my Ph.D. degree from The Chinese University of Hong Kong MMLab, supervised by Prof. Dahua Lin. I got my bachelor's degree from School of Electronic Information and Electrical Engineering at Shanghai Jiao Tong University in 2021, supervised by Prof. Weiyao Lin.

I am interested in computer vision and machine learning, especially self-supervised learning, video understanding and multi-modal large language models.

Email / CV / Google Scholar / Github

Technical Report

Seed1.8 Model Card: Towards Generalized Real-World Agency
ByteDance Seed
Model card, 2025
Project Page | PDF | cookbook

Seed1.5-VL Technical Report
ByteDance Seed
Tech report, 2025
Project Page | arXiv | cookbook

Publications

	SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang ICCV, 2025 Project Page \| arXiv \| code \| PDF Outperform SAM 2 by a large margin through a training-free memory tree.
	Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Rui Qian, Shuangrui Ding, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang CVPR, 2025 arXiv / code Asynchronous operation of disentangled perception, decision, and reaction modules for online video LLMs.
	Streaming Long Video Understanding with Large Language Models Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Shuangrui Ding, Dahua Lin, Jiaqi Wang NeurIPS, 2024 arXiv Long video understanding with disentangled streaming video encoding and LLM reasoning.
	Rethinking Image-to-Video Adaptation: An Object- centric Perspective Rui Qian, Shuangrui Ding, Dahua Lin ECCV, 2024 Efficiently adapt image foundation models to video domain in an object-centric manner.
	Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation Shuangrui Ding, Rui Qian*, Haohang Xu, Dahua Lin, Hongkai Xiong ECCV*, 2024 arXiv, code Learn robust spatio-temporal corrependence on top of DINO-pretrained Transformer without any annotation.
	Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin ICCV, 2023 arXiv, pdf, code Jointly utilizes high-level semantics and low-level temporal correspondence for object-centric learning in videos without any supervision.
	Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian ICCV, 2023 Project page, arXiv, pdf, code Propose token pruning strategy for video Transformers to offer a competitive speed-accuracy trade-off without additional training or parameters.
	Static and Dynamic Concepts for Self-supervised Video Representation Learning Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin ECCV, 2022 arXiv, code Learn static and dynamic visual concepts in videos to aggregate local patterns with similar semantics to boost unsupervised video representation.
	Dual Contrastive Learning for Spatio-temporal Representation Shuangrui Ding, Rui Qian, Hongkai Xiong ACM MM, 2022 arXiv, code A novel dual contrastive formulation is presented to decouple the static/dynamic features and thus mitigate the background bias.
	Motion-aware Contrastive Video Representation Learning via Foreground-background Merging Shuangrui Ding, Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong CVPR, 2022 Project page, arXiv, code, Chinese coverage Mitigate the background bias in self-supervised video representation learning via copy-pasting the foreground onto the other backgrounds.
	Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou AAAI*, 2022 arXiv Erase the interference in general multi-modal scenes for robust visual sound localization.
	TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition Shuyuan Li, Huabin Liu, Rui Qian, Yuxi Li, John See, Mengjuan Fei, Xiaoyuan Yu, Weiyao Lin AAAI, 2022 arXiv, code Solve action duration misalignment and action evolution misalignment in few-shot settings.
	Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu, Dian Li, Weiyao Lin ICCV, 2021 arXiv, code Self-supervised video representation learning from the perspective of both high-level semantics and lower-level characteristics
	Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching Di Hu, Rui Qian, Minyue Jiang, Tan Xiao, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou NeurIPS, 2020 arXiv, code Discriminative sounding objects localization in cocktail-party scenario in a two-stage manner
	Multiple Sound Sources Localization from Coarse to Fine Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin ECCV, 2020 arXiv, code Complex audiovisual scene understanding, to associate sound-object pairs from coarse to fine
	Finding Action Tubes with a Sparse-to-Dense Framework Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Limin Wang, Shugong Xu AAAI, 2020 Spatio-temporal action localization, to localize 3D action tubes in temporal and spatial domain

Awards

Hong Kong PhD Fellowship Scheme. 2021

Outstanding Graduate of Shanghai. 2021

Top 1% Bachelor Thesis Award of Shanghai Jiao Tong University. 2021

Sensetime Scholarship. 2020

Finalist winner of MCM. 2019

Nation Scholarship, Ministry of Education of China. 2018

Professional Services

Reviewer: ICLR'22/23/24, CVPR'22/23/24, ECCV'22/24, ICCV'23, NeurIPS'22/23/24, ICML'23/24, AAAI'23/24.

Misc

1. Great honor to be awarded the bachelor's degree from Shanghai Jiao Tong University, really memorable four years and great thanks for the friends, colleagues and professors.

2. I am fond of Formula 1 and a real Tifosi. Excited to see Ferrari coming back at the beginning of this season and enjoy the battle between Charles Leclerc and Max Verstappen. However, Ferrari never disappoints fans in terms of letting fans disappointed as usual.

3. Here is my best friend Shuangrui, who is really talented and interesting. More than ten years' friendship, great company, marvellous collaboration, memorable encouragement and indispensible shared enjoyment.

Updated at Jun. 2025

Thanks Jon Barron for this amazing template.