CVPR

5,589 papers tracked across 2 years

Top Papers in CVPR 2024

View all papers
#1

Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li et al.

CVPR 2024
4,359
citations
#2

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024
2,565
citations
#3

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024
2,295
citations
#4

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue, Yuansheng Ni, Kai Zhang et al.

CVPR 2024
1,715
citations
#5

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang et al.

CVPR 2024
1,479
citations
#6

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang et al.

CVPR 2024
1,110
citations
#7

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024
1,072
citations
#8

DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.

CVPR 2024
1,005
citations
#9

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024
902
citations
#10

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024
742
citations
#11

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou et al.

CVPR 2024
710
citations
#12

VILA: On Pre-training for Visual Language Models

Ji Lin, Danny Yin, Wei Ping et al.

CVPR 2024
701
citations
#13

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Li Hu

CVPR 2024
684
citations
#14

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024
682
citations
#15

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.

CVPR 2024
672
citations
#16

SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

Antoine Guédon, Vincent Lepetit

CVPR 2024
654
citations
#17

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong, Weihan Wang, Qingsong Lv et al.

CVPR 2024
629
citations
#18

Mip-Splatting: Alias-free 3D Gaussian Splatting

Zehao Yu, Anpei Chen, Binbin Huang et al.

CVPR 2024
627
citations
#19

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu et al.

CVPR 2024
620
citations
#20

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Qinghao Ye, Haiyang Xu, Jiabo Ye et al.

CVPR 2024
614
citations