α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Qi Dai
Qi Dai
23
papers
2,737
total citations
papers (23)
SimMIM: A Simple Framework for Masked Image Modeling
CVPR 2022
arXiv
1,673
citations
Weakly-Supervised Action Localization by Generative Attention Modeling
CVPR 2020
arXiv
167
citations
Rethinking Spatial Invariance of Convolutional Networks for Object Counting
CVPR 2022
arXiv
125
citations
SVFormer: Semi-Supervised Video Transformer for Action Recognition
CVPR 2023
arXiv
121
citations
SimDA: Simple Diffusion Adapter for Efficient Video Generation
CVPR 2024
arXiv
107
citations
On Data Scaling in Masked Image Modeling
CVPR 2023
arXiv
71
citations
StableAnimator: High-Quality Identity-Preserving Human Image Animation
CVPR 2025
arXiv
64
citations
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023
arXiv
61
citations
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024
arXiv
60
citations
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
ICCV 2023
arXiv
60
citations
ResFormer: Scaling ViTs With Multi-Resolution Training
CVPR 2023
arXiv
51
citations
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
ICCV 2023
arXiv
31
citations
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
CVPR 2024
arXiv
28
citations
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
ICCV 2025
arXiv
24
citations
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
CVPR 2025
arXiv
23
citations
MotionFollower: Editing Video Motion via Score-Guided Diffusion
ICCV 2025
22
citations
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
ICCV 2025
arXiv
20
citations
Temporal Action Detection With Multi-Level Supervision
ICCV 2021
arXiv
17
citations
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
arXiv
7
citations
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
ICCV 2025
arXiv
4
citations
FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network
AAAI 2025
1
citations
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
CVPR 2025
0
citations
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
CVPR 2024
0
citations