ICCV
2,701 papers tracked across 1 years
Top Papers in ICCV 2025
View all papers →Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
yi yang, Xiaoxuan He, Hongkun Pan et al.
OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang et al.
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos
Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang, zehai he, Wenyi Hong et al.
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
CHENMING ZHU, Tai Wang, Wenwei Zhang et al.
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao, Zitao Liu et al.
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion
Wenqiang Sun, Shuo Chen, Fangfu Liu et al.
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
gaojie lin, Jianwen Jiang, Jiaqi Yang et al.
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Jensen Zhou, Hang Gao, Vikram Voleti et al.
MV-Adapter: Multi-View Consistent Image Generation Made Easy
Zehuan Huang, Yuan-Chen Guo, Haoran Wang et al.
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Xingjian Leng, Jaskirat Singh, Yunzhong Hou et al.
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
Shaoyuan Xie, Lingdong Kong, Yuhao Dong et al.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Yuxuan Zhang, Yirui Yuan, Yiren Song et al.
GameFactory: Creating New Games with Generative Interactive Videos
Jiwen Yu, Yiran Qin, Xintao Wang et al.
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu, Diankun Zhang, Zongchuang Zhao et al.
StreamDiffusion: A Pipeline-level Solution for Real-Time Interactive Generation
Akio Kodaira, Chenfeng Xu, Toshiki Hazama et al.