ECCV
2,387 papers tracked across 1 years
Top Papers in ECCV 2024
View all papers →Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
MMBENCH: Is Your Multi-Modal Model an All-around Player?
Yuan Liu, Haodong Duan, Yuanhan Zhang et al.
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen, Jinsong Li, Xiaoyi Dong et al.
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.
Adversarial Diffusion Distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
Hang Guo, Jinmin Li, Tao Dai et al.
Grounding Image Matching in 3D with MASt3R
Vincent Leroy, Yohann Cabon, Jerome Revaud
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li, Chengyao Wang, Jiaya Jia
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.
CoTracker: It is Better to Track Together
Nikita Karaev, Ignacio Rocco, Ben Graham et al.
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma, Mark Goldstein, Michael Albergo et al.
MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin, Chas Leichner, Manolis Delakis et al.
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang et al.
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li, Xinhao Li, Yi Wang et al.
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima, Katrin Renz, Kashyap Chitta et al.
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu et al.
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.
Wavelet Convolutions for Large Receptive Fields
Shahaf Finder, Roy Amoyal, Eran Treister et al.