Highlight Papers

975 papers found • Page 2 of 20

Boost Your Human Image Generation Model via Direct Preference Optimization

Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee

CVPR 2025highlightarXiv:2405.20216
8
citations

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

ICCV 2025highlightarXiv:2507.02859
3
citations

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

Zesen Cheng, Hang Zhang, Kehan Li et al.

CVPR 2025highlight

BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan, Jiaxin Guo, Chen Wang et al.

ICCV 2025highlightarXiv:2508.04611
4
citations

BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes

Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.

ICCV 2025highlightarXiv:2503.07940
5
citations

BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer

Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.

CVPR 2025highlight
3
citations

CADDreamer: CAD Object Generation from Single-view Images

Yuan Li, Cheng Lin, Yuan Liu et al.

CVPR 2025highlightarXiv:2502.20732
10
citations

Can Generative Video Models Help Pose Estimation?

Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.

CVPR 2025highlightarXiv:2412.16155
7
citations

Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding

Zhaoran Zhao, Peng Lu, Anran Zhang et al.

CVPR 2025highlight

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image

Jingshun Huang, Haitao Lin, Tianyu Wang et al.

CVPR 2025highlightarXiv:2504.11230
3
citations

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning

Kuniaki Saito, Donghyun Kim, Kwanyong Park et al.

ICCV 2025highlightarXiv:2507.01409

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

Yuan Zhou, Qingshan Xu, Jiequan Cui et al.

CVPR 2025highlightarXiv:2411.16170
1
citations

CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design

Weitao Feng, Hang Zhou, Jing Liao et al.

CVPR 2025highlightarXiv:2504.19478
4
citations

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.

CVPR 2025highlightarXiv:2503.05936
4
citations

CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance

Peiqi Chen, Lei Yu, Yi Wan et al.

ICCV 2025highlightarXiv:2507.17312

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

Likai Tian, Jian Zhao, Zechao Hu et al.

CVPR 2025highlight
10
citations

CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching

Jiaqi Li, Yiran Wang, Jinghong Zheng et al.

CVPR 2025highlight

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.

CVPR 2025highlightarXiv:2503.18803
13
citations

ChartCap: Mitigating Hallucination of Dense Chart Captioning

Junyoung Lim, Jaewoo Ahn, Gunhee Kim

ICCV 2025highlightarXiv:2508.03164
2
citations

CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation

Yuxing Long, Jiyao Zhang, Mingjie Pan et al.

CVPR 2025highlightarXiv:2506.09343
2
citations

CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

Arindam Dutta, Meng Zheng, Zhongpai Gao et al.

ICCV 2025highlightarXiv:2503.15671
4
citations

Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning

Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.

CVPR 2025highlightarXiv:2412.00175
9
citations

ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate

Ming Yan, Xincheng Lin, Yuhua Luo et al.

CVPR 2025highlightarXiv:2503.21268

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

Tianyu Huai, Jie Zhou, Xingjiao Wu et al.

CVPR 2025highlightarXiv:2503.00413
10
citations

CObL: Toward Zero-Shot Ordinal Layering without User Prompting

Aneel Damaraju, Dean Hazineh, Todd Zickler

ICCV 2025highlightarXiv:2508.08498

Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models

Zichen Miao, WEI CHEN, Qiang Qiu

CVPR 2025highlightarXiv:2503.18337
7
citations

CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Zizhuo Li, Yifan Lu, Linfeng Tang et al.

ICCV 2025highlightarXiv:2503.23925
4
citations

Combinative Matching for Geometric Shape Assembly

Nahyuk Lee, Juhong Min, Junhong Lee et al.

ICCV 2025highlightarXiv:2508.09780

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang et al.

CVPR 2025highlightarXiv:2406.10462
12
citations

Compositional Caching for Training-free Open-vocabulary Attribute Detection

Marco Garosi, Alessandro Conti, Gaowen Liu et al.

CVPR 2025highlightarXiv:2503.19145
3
citations

Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers

Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon et al.

CVPR 2025highlightarXiv:2507.04388
5
citations

Confound from All Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness

Junhao Dong, Jiao Liu, Xinghua Qu et al.

ICCV 2025highlight

Consensus-Driven Active Model Selection

Justin Kay, Grant Horn, Subhransu Maji et al.

ICCV 2025highlightarXiv:2507.23771
2
citations

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

ICCV 2025highlightarXiv:2508.00427
2
citations

Context-Aware Multimodal Pretraining

Karsten Roth, Zeynep Akata, Dima Damen et al.

CVPR 2025highlightarXiv:2411.15099
4
citations

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

Tuna Meral, Enis Simsar, Federico Tombari et al.

ICCV 2025highlightarXiv:2403.19776
5
citations

CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception

Jiaru Zhong, Jiahao Wang, Jiahui Xu et al.

ICCV 2025highlightarXiv:2507.19239
9
citations

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

Jingjing Jiang, Chao Ma, Xurui Song et al.

ICCV 2025highlightarXiv:2507.07424
7
citations

CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation

Bonan Li, Zicheng Zhang, Xingyi Yang et al.

CVPR 2025highlight
1
citations

CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Zongheng Tang, Yi Liu, Yifan Sun et al.

ICCV 2025highlightarXiv:2508.00359
2
citations

CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos

Nikita Karaev, Iurii Makarov, Jianyuan Wang et al.

ICCV 2025highlightarXiv:2410.11831
223
citations

CounterPC: Counterfactual Feature Realignment for Unsupervised Domain Adaptation on Point Clouds

Feng Yang, Yichao Cao, Xiu Su et al.

ICCV 2025highlight

COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts

Jiansheng Li, Xingxuan Zhang, Hao Zou et al.

CVPR 2025highlightarXiv:2504.10158
1
citations

CountSE: Soft Exemplar Open-set Object Counting

Shuai Liu, Peng Zhang, Shiwei Zhang et al.

ICCV 2025highlight

Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting

Hanxi Liu, Yifang Men, Zhouhui Lian

CVPR 2025highlightarXiv:2504.20403
1
citations

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

Jingnan Shi, Rajat Talak, Harry Zhang et al.

CVPR 2025highlightarXiv:2412.01052
8
citations

Cross-Architecture Distillation Made Simple with Redundancy Suppression

Weijia Zhang, Yuehao Liu, Wu Ran et al.

ICCV 2025highlightarXiv:2507.21844
2
citations

Cross-modal Causal Relation Alignment for Video Question Grounding

weixing chen, Yang Liu, Binglin Chen et al.

CVPR 2025highlightarXiv:2503.07635
8
citations

CrossOver: 3D Scene Cross-Modal Alignment

Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.

CVPR 2025highlightarXiv:2502.15011
8
citations

Cross-View Completion Models are Zero-shot Correspondence Estimators

Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.

CVPR 2025highlightarXiv:2412.09072
19
citations