Most Cited CVPR &quot;gpu kernel design&quot; Papers

CVPR 2024highlightarXiv:2403.15891

#3202

Human Motion Prediction Under Unexpected Perturbation

Jiangbei Yue, Baiyi Li, Julien Pettré et al.

CVPR 2025arXiv:2412.09545

#3203

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

Xueting Li, Ye Yuan, Shalini De Mello et al.

CVPR 2024arXiv:2403.12236

#3204

Improving Generalization via Meta-Learning on Hard Samples

Nishant Jain, Arun Suggala, Pradeep Shenoy

CVPR 2025arXiv:2411.16932

#3205

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.

CVPR 2024arXiv:2507.14559

#3206

LEAD: Exploring Logit Space Evolution for Model Selection

Zixuan Hu, Xiaotong Li, SHIXIANG TANG et al.

CVPR 2025arXiv:2412.00719

#3207

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.

#3208

Unsupervised Deep Unrolling Networks for Phase Unwrapping

Zhile Chen, Yuhui Quan, Hui Ji

CVPR 2024arXiv:2406.06730

#3209

TRINS: Towards Multimodal Language Models that Can Read

Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.

CVPR 2025arXiv:2412.05538

#3210

Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models

Hao Cheng, Erjia Xiao, Jiayan Yang et al.

CVPR 2025arXiv:2503.24129

#3211

It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data

Dominik Schnaus, Nikita Araslanov, Daniel Cremers

CVPR 2025arXiv:2411.01492

#3212

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark

Ming Li, Jike Zhong, Tianle Chen et al.

CVPR 2024arXiv:2311.13612

#3213

Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning

Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis

CVPR 2025highlightarXiv:2503.04919

#3214

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

Ian Huang, Yanan Bao, Karen Truong et al.

CVPR 2025highlightarXiv:2411.14628

#3215

HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition

Zimo Wang, Cheng Wang, Taiki Yoshino et al.

#3216

When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach

TAO MA, Bing Bai, Haozhe Lin et al.

CVPR 2025arXiv:2412.14456

#3217

LEDiff: Latent Exposure Diffusion for HDR Generation

Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.

CVPR 2025arXiv:2411.10411

#3218

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation

Markus Karmann, Onay Urfalioglu

CVPR 2024arXiv:2312.17686

#3219

Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization

Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

CVPR 2025arXiv:2411.18672

#3220

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Alice Heiman, Xiaoman Zhang, Emma Chen et al.

CVPR 2025arXiv:2411.18552

#3221

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

Haosen Yang, Adrian Bulat, Isma Hadji et al.

CVPR 2024arXiv:2406.03461

#3222

Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts

Dominik Scheuble, Chenyang Lei, Mario Bijelic et al.

CVPR 2024arXiv:2401.04071

#3223

Fun with Flags: Robust Principal Directions via Flag Manifolds

Tolga Birdal, Nathan Mankovich

CVPR 2025arXiv:2504.02264

#3224

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.

CVPR 2024arXiv:2403.00939

#3225

G3DR: Generative 3D Reconstruction in ImageNet

Pradyumna Reddy, Ismail Elezi, Jiankang Deng

CVPR 2024arXiv:2403.03662

#3226

Harnessing Meta-Learning for Improving Full-Frame Video Stabilization

Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim et al.

#3227

Scene Map-based Prompt Tuning for Navigation Instruction Generation

Sheng Fan, Rui Liu, Wenguan Wang et al.

CVPR 2025highlightarXiv:2503.00383

#3228

Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems

Song Xia, Yi Yu, Wenhan Yang et al.

CVPR 2024arXiv:2405.20729

#3229

Extreme Point Supervised Instance Segmentation

Hyeonjun Lee, Sehyun Hwang, Suha Kwak

#3230

Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization

Ye Chen, Bingbing Ni, Jinfan Liu et al.

CVPR 2025arXiv:2509.22412

#3231

FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing

Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah

CVPR 2025arXiv:2503.06457

#3232

Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning

Yanbiao Ma, Wei Dai, Wenke Huang et al.

CVPR 2024arXiv:2403.01231

#3233

Benchmarking Segmentation Models with Mask-Preserved Attribute Editing

Zijin Yin, Kongming Liang, Bing Li et al.

CVPR 2025arXiv:2502.03629

#3234

RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.

CVPR 2025arXiv:2411.14743

#3235

FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification

Zhengrui Guo, Conghao Xiong, Jiabo MA et al.

CVPR 2024arXiv:2312.13746

#3236

Video Recognition in Portrait Mode

Mingfei Han, Linjie Yang, Xiaojie Jin et al.

CVPR 2025highlightarXiv:2412.16155

#3237

Can Generative Video Models Help Pose Estimation?

Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.

CVPR 2024arXiv:2403.19474

#3238

SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks

Yaxu Xie, Alain Pagani, Didier Stricker

CVPR 2025arXiv:2410.16290

#3239

A Unified Model for Compressed Sensing MRI Across Undersampling Patterns

Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.

CVPR 2025arXiv:2412.01822

#3240

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.

CVPR 2025arXiv:2506.18335

#3241

Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention

Saad Wazir, Daeyoung Kim

CVPR 2024arXiv:2312.02480

#3242

Differentiable Point-based Inverse Rendering

Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek

CVPR 2024arXiv:2404.00301

#3243

Monocular Identity-Conditioned Facial Reflectance Reconstruction

Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.

CVPR 2025arXiv:2503.17731

#3244

Co-op: Correspondence-based Novel Object Pose Estimation

Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.

CVPR 2025arXiv:2504.02508

#3245

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.

CVPR 2024arXiv:2311.17352

#3246

Efficient Stitchable Task Adaptation

Haoyu He, Zizheng Pan, Jing Liu et al.

CVPR 2025arXiv:2505.12745

#3247

PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization

Dongkyu Cho, Inwoo Hwang, Sanghack Lee

CVPR 2025arXiv:2503.08363

#3248

Parametric Point Cloud Completion for Polygonal Surface Reconstruction

Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.

#3249

FluxSpace: Disentangled Semantic Editing in Rectified Flow Models

Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag

CVPR 2025highlightarXiv:2412.00505

#3250

Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion

Jona Ballé, Luca Versari, Emilien Dupont et al.

CVPR 2025arXiv:2412.10153

#3251

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.

CVPR 2025arXiv:2505.22859

#3252

4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians

Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison

CVPR 2025arXiv:2503.05186

#3253

Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions

Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.

CVPR 2025highlightarXiv:2504.01955

#3254

Scene-Centric Unsupervised Panoptic Segmentation

Oliver Hahn, Christoph Reich, Nikita Araslanov et al.

CVPR 2025arXiv:2412.01537

#3255

HandOS: 3D Hand Reconstruction in One Stage

Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.

CVPR 2025arXiv:2503.21442

#3256

RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting

Qiyu Dai, Xingyu Ni, Qianfan Shen et al.

CVPR 2025arXiv:2501.11043

#3257

BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.

CVPR 2025highlightarXiv:2412.11441

#3258

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models

Yuning Han, Bingyin Zhao, Rui Chu et al.

CVPR 2025arXiv:2412.00071

#3259

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.

CVPR 2025highlightarXiv:2412.00782

#3260

Memories of Forgotten Concepts

Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.

CVPR 2025highlightarXiv:2505.05309

#3261

Augmented Deep Contexts for Spatially Embedded Video Coding

Yifan Bian, Chuanbo Tang, Li Li et al.

CVPR 2025arXiv:2411.16718

#3262

Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification

S P Sharan, Minkyu Choi, Sahil Shah et al.

CVPR 2025arXiv:2503.04501

#3263

IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

Zhihao Shi, Dong Huo, Yuhongze Zhou et al.

CVPR 2025arXiv:2503.12242

#3264

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.

CVPR 2025arXiv:2506.17891

#3265

Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation

Edward LOO, Jiacheng Deng

CVPR 2025arXiv:2503.03325

#3266

Golden Cudgel Network for Real-Time Semantic Segmentation

Guoyu Yang, Yuan Wang, Daming Shi et al.

CVPR 2025arXiv:2504.12104

#3267

Logits DeConfusion with CLIP for Few-Shot Learning

Shuo Li, Fang Liu, Zehua Hao et al.

CVPR 2025arXiv:2408.15503

#3268

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Haisheng Su, Feixiang Song, CONG MA et al.

CVPR 2025arXiv:2503.21659

#3269

InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

Kuang Wu, Chuan Yang, Zhanbin Li

CVPR 2025arXiv:2503.23717

#3270

Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space

Yi Liu, Wengen Li, Jihong Guan et al.

CVPR 2025arXiv:2501.06035

#3271

Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.

CVPR 2025arXiv:2410.13924

#3272

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann et al.

CVPR 2025arXiv:2412.20596

#3273

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Tomer Garber, Tom Tirer

CVPR 2025highlightarXiv:2501.03729

#3274

Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.

CVPR 2025arXiv:2504.19500

#3275

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

Yan Wang, Baoxiong Jia, Ziyu Zhu et al.

CVPR 2025highlightarXiv:2503.12127

#3276

Hyperbolic Safety-Aware Vision-Language Models

Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.

CVPR 2025arXiv:2503.15185

#3277

3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

Gyeongrok Oh, Sung June Kim, Heeju Ko et al.

CVPR 2025highlightarXiv:2412.04464

#3278

DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

Ben Kaye, Tomas Jakab, Shangzhe Wu et al.

CVPR 2025arXiv:2503.01980

#3279

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Davide Caffagni, Sara Sarto, Marcella Cornia et al.

CVPR 2025arXiv:2403.14539

#3280

Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild

Junhyeong Cho, Kim Youwang, Hunmin Yang et al.

CVPR 2025highlightarXiv:2502.05165

#3281

Multitwine: Multi-Object Compositing with Text and Layout Control

Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.

CVPR 2025arXiv:2503.10143

#3282

GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping

Jinfeng Liu, Lingtong Kong, Bo Li et al.

CVPR 2025highlightarXiv:2503.15934

#3283

SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer

Hongda Liu, Longguang Wang, Ye Zhang et al.

CVPR 2025arXiv:2501.08303

#3284

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris et al.

CVPR 2025highlightarXiv:2502.20134

#3285

Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models

Itay Benou, Tammy Riklin Raviv

CVPR 2025arXiv:2412.03844

#3286

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

Jingyu Lin, Jiaqi Gu, Lubin Fan et al.

CVPR 2025highlightarXiv:2506.11543

#3287

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025arXiv:2503.16247

#3288

OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection

Max Gutbrod, David Rauber, Danilo Weber Nunes et al.

CVPR 2025arXiv:2503.19358

#3289

From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

Zhiwei Huang, Hailin Yu, Yichun Shentu et al.

CVPR 2025highlightarXiv:2411.19474

#3290

Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB

Nikhil Behari, Aaron Young, Siddharth Somasundaram et al.

CVPR 2025arXiv:2412.05507

#3291

AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration

Jiong Lin, Lechen Zhang, Kwansoo Lee et al.

CVPR 2025arXiv:2505.07539

#3292

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

Hao Li, Sicheng Li, Xiang Gao et al.

CVPR 2025highlightarXiv:2411.16310

#3293

Functionality Understanding and Segmentation in 3D Scenes

Jaime Corsetti, Francesco Giuliari, Alice Fasoli et al.

CVPR 2025arXiv:2412.03517

#3294

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

Lingen Li, Zhaoyang Zhang, Yaowei Li et al.

CVPR 2025arXiv:2412.03911

#3295

Multi-View Pose-Agnostic Change Localization with Zero Labels

Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim et al.

CVPR 2025arXiv:2412.01987

#3296

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

Tomas Soucek, Prajwal Gatti, Michael Wray et al.

CVPR 2025arXiv:2504.10746

#3297

Hearing Anywhere in Any Environment

Xiulong Liu, Anurag Kumar, Paul Calamia et al.

CVPR 2025arXiv:2411.19895

#3298

GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.

CVPR 2025arXiv:2503.17690

#3299

CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model

Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.

CVPR 2025highlightarXiv:2503.06956

#3300

LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending

Jian Jin, Zhenbo Yu, Yang Shen et al.

#3301

Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding

Changshuo Wang, Shuting He, Xiang Fang et al.

#3302

MATCHA: Towards Matching Anything

Fei Xue, Sven Elflein, Laura Leal-Taixe et al.

CVPR 2025highlight

CVPR 2025arXiv:2503.19913

#3303

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

Mingju Gao, Yike Pan, Huan-ang Gao et al.

#3304

AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting

Kenghong Lin, Baoquan Zhang, Demin Yu et al.

CVPR 2025highlightarXiv:2503.13985

#3305

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

Jaewoo Song, Daemin Park, Kanghyun Baek et al.

CVPR 2025arXiv:2412.19712

#3306

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Jiawei Lin, Shizhao Sun, Danqing Huang et al.

CVPR 2025arXiv:2504.01872

#3307

CoMatcher: Multi-View Collaborative Feature Matching

Jintao Zhang, Zimin Xia, Mingyue Dong et al.

CVPR 2025arXiv:2411.15432

#3308

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

Qizhou Chen, Chengyu Wang, Dakan Wang et al.

CVPR 2025arXiv:2411.15556

#3309

ReWind: Understanding Long Videos with Instructed Learnable Memory

Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.

CVPR 2025arXiv:2503.01653

#3310

Distilled Prompt Learning for Incomplete Multimodal Survival Prediction

Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.

CVPR 2025arXiv:2505.11182

#3311

Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning

yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.

#3312

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abduljalil Radman, Jorma Laaksonen

CVPR 2025arXiv:2505.18582

#3313

On Denoising Walking Videos for Gait Recognition

Dongyang Jin, Chao Fan, Jingzhe Ma et al.

CVPR 2025arXiv:2411.08466

#3314

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models

Quan Zhang, Jinwei Fang, Rui Yuan et al.

CVPR 2025arXiv:2503.12758

#3315

VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis

Zhifeng Wang, Renjiao Yi, Xin Wen et al.

CVPR 2025arXiv:2408.15708

#3316

Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Xinyu Gao, Ziyi Yang, Bingchen Gong et al.

CVPR 2025arXiv:2503.17261

#3317

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu et al.

CVPR 2025highlightarXiv:2506.02493

#3318

Towards In-the-wild 3D Plane Reconstruction from a Single Image

Jiachen Liu, Rui Yu, Sili Chen et al.

CVPR 2025arXiv:2503.12982

#3319

SparseAlign: a Fully Sparse Framework for Cooperative Object Detection

Yunshuang Yuan, Yan Xia, Daniel Cremers et al.

CVPR 2025highlightarXiv:2503.21076

#3320

KAC: Kolmogorov-Arnold Classifier for Continual Learning

Yusong Hu, Zichen Liang, Fei Yang et al.

CVPR 2025arXiv:2501.14277

#3321

Dense-SfM: Structure from Motion with Dense Consistent Matching

JongMin Lee, Sungjoo Yoo

CVPR 2025arXiv:2501.09333

#3322

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.

CVPR 2025arXiv:2503.23220

#3323

Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection

Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander

CVPR 2025arXiv:2411.12858

#3324

CDI: Copyrighted Data Identification in Diffusion Models

Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.

CVPR 2025arXiv:2504.01961

#3325

Learning from Streaming Video with Orthogonal Gradients

Tengda Han, Dilara Gokay, Joseph Heyward et al.

CVPR 2025arXiv:2412.00148

#3326

Motion Modes: What Could Happen Next?

Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.

CVPR 2025arXiv:2505.19618

#3327

Rotation-Equivariant Self-Supervised Method in Image Denoising

Hanze Liu, Jiahong Fu, Qi Xie et al.

CVPR 2025highlightarXiv:2412.19637

#3328

ReNeg: Learning Negative Embedding with Reward Guidance

Xiaomin Li, yixuan liu, Takashi Isobe et al.

CVPR 2025highlightarXiv:2406.04251

#3329

Improving Gaussian Splatting with Localized Points Management

Haosen Yang, Chenhao Zhang, Wenqing Wang et al.

CVPR 2025arXiv:2406.10889

#3330

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment

Darshana Saravanan, Varun Gupta, Darshan Singh S et al.

CVPR 2025arXiv:2407.07174

#3331

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Xiaoding Yuan, Shitao Tang, Kejie Li et al.

CVPR 2025arXiv:2504.13065

#3332

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

Yang Yue, Yulin Wang, Haojun Jiang et al.

#3333

Uncertain Multimodal Intention and Emotion Understanding in the Wild

Qu Yang, QingHongYa Shi, Tongxin Wang et al.

CVPR 2025arXiv:2502.19842

#3334

CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation

Reza Abbasi, Ali Nazari, Aminreza Sefid et al.

CVPR 2025arXiv:2502.20032

#3335

Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping

Guannan Lai, Yujie Li, Xiangkun Wang et al.

CVPR 2025arXiv:2505.11800

#3336

Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model

Jian Zhu, He Wang, Yang Xu et al.

#3337

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

Yueru Jia, Jiaming Liu, Sixiang Chen et al.

CVPR 2025arXiv:2401.12217

#3338

Exploring Simple Open-Vocabulary Semantic Segmentation

Zihang Lai

CVPR 2025arXiv:2411.18000

#3339

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Shuyang Hao, Bryan Hooi, Jun Liu et al.

CVPR 2025arXiv:2503.18211

#3340

SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction

Zhengyuan Li, Kai Cheng, Anindita Ghosh et al.

CVPR 2025arXiv:2411.16799

#3341

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Yuchen Xia, Quan Yuan, Guiyang Luo et al.

CVPR 2025arXiv:2503.01845

#3342

Denoising Functional Maps: Diffusion Models for Shape Correspondence

Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik

#3343

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

Huangbiao Xu, Xiao Ke, Huanqi Wu et al.

CVPR 2025arXiv:2504.01515

#3344

Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis

Zixuan Wang, DUO PENG, Feng Chen et al.

CVPR 2025arXiv:2505.11707

#3345

Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration

Haipeng Fang, Sheng Tang, Juan Cao et al.

CVPR 2025arXiv:2411.03239

#3346

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

Huan Zheng, Wencheng Han, Jianbing Shen

CVPR 2025arXiv:2504.06666

#3347

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

ruotian peng, Haiying He, Yake Wei et al.

CVPR 2025highlightarXiv:2411.15678

#3348

Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.

CVPR 2025arXiv:2503.01359

#3349

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Yongqi Huang, Peng Ye, Chenyu Huang et al.

CVPR 2025arXiv:2504.08181

#3350

TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

Ruineng Li, Daitao Xing, Huiming Sun et al.

CVPR 2025arXiv:2503.09243

#3351

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Ruihai Wu, Ziyu Zhu, Yuran Wang et al.

CVPR 2025arXiv:2506.09952

#3352

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Ziyi Wang, Yanran Zhang, Jie Zhou et al.

CVPR 2025highlightarXiv:2411.15459

#3353

MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking

Xinqi Liu, Li Zhou, Zikun Zhou et al.

#3354

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Weinan Jia, Mengqi Huang, Nan Chen et al.

CVPR 2025highlightarXiv:2502.07814

#3355

Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution

Siwei Tu, Ben Fei, Weidong Yang et al.

CVPR 2025arXiv:2503.09344

#3356

Unified Dense Prediction of Video Diffusion

Lehan Yang, Lu Qi, Xiangtai Li et al.

CVPR 2025arXiv:2506.08210

#3357

A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

Andrew Z Wang, Songwei Ge, Tero Karras et al.

CVPR 2025arXiv:2503.23733

#3358

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Yiyang Du, Xiaochen Wang, Chi Chen et al.

CVPR 2025arXiv:2507.17083

#3359

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2025arXiv:2412.13573

#3360

Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes

Aodi Li, Liansheng Zhuang, Xiao Long et al.

CVPR 2025arXiv:2503.15197

#3361

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization

Feifei Li, Mi Zhang, Yiming Sun et al.

CVPR 2025arXiv:2412.03752

#3362

Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning

Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.

CVPR 2025arXiv:2412.21063

#3363

Navigating Image Restoration with VAR’s Distribution Alignment Prior

Siyang Wang, Naishan Zheng, Jie Huang et al.

CVPR 2025arXiv:2502.01846

#3364

UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping

Aashish Rai, Dilin Wang, Mihir Jain et al.

CVPR 2025highlightarXiv:2501.11319

#3365

StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer

ruojun xu, Weijie Xi, Xiaodi Wang et al.

#3366

Generative Sparse-View Gaussian Splatting

Hanyang Kong, Xingyi Yang, Xinchao Wang

CVPR 2025arXiv:2505.05829

#3367

Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

Zhiyuan Chen, Keyi Li, Yifan Jia et al.

CVPR 2025arXiv:2503.21766

#3368

Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence

Haolin Liu, Xiaohang Zhan, Zizheng Yan et al.

CVPR 2025arXiv:2502.19781

#3369

RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings

Aayush Dhakal, Srikumar Sastry, Subash Khanal et al.

#3370

Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers

Yichen Xiao, Shuai Wang, Dehao Zhang et al.

CVPR 2025arXiv:2504.08449

#3371

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input

Jian Wang, Rishabh Dabral, Diogo Luvizon et al.

CVPR 2025arXiv:2406.13378

#3372

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

Zidong Cao, Jinjing Zhu, Weiming Zhang et al.

CVPR 2025arXiv:2503.18137

#3373

TCFG: Tangential Damping Classifier-free Guidance

Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.

#3374

Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection

wenbing zhu, Lidong Wang, Ziqing Zhou et al.

CVPR 2025arXiv:2503.14867

#3375

DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition

Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.

CVPR 2025arXiv:2503.21780

#3376

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.

CVPR 2025arXiv:2411.12355

#3377

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo, Liyuan Pan et al.

#3378

Keyframe-Guided Creative Video Inpainting

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

#3379

POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation

Jian Wang, Tianhong Dai, Bingfeng Zhang et al.

#3380

3D-MVP: 3D Multiview Pretraining for Manipulation

Shengyi Qian, Kaichun Mo, Valts Blukis et al.

CVPR 2025arXiv:2503.15406

#3381

Visual Persona: Foundation Model for Full-Body Human Customization

Jisu Nam, Soowon Son, Zhan Xu et al.

CVPR 2025arXiv:2411.15555

#3382

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation

Fengfan Zhou, Bangjie Yin, Hefei Ling et al.

#3383

MIRE: Matched Implicit Neural Representations

Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.

CVPR 2025arXiv:2412.06243

#3384

U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening

Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.

#3385

Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment

Yang Bai, Yucheng Ji, Min Cao et al.

CVPR 2025arXiv:2504.18509

#3386

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Shivam Duggal, Yushi Hu, Oscar Michel et al.

CVPR 2024arXiv:2403.16412

#3387

Unsupervised Template-assisted Point Cloud Shape Correspondence Network

Jiacheng Deng, Jiahao Lu, Tianzhu Zhang

CVPR 2024highlightarXiv:2403.15789

#3388

In-Context Matting

He Guo, Zixuan Ye, Zhiguo Cao et al.

CVPR 2024arXiv:2403.06205

#3389

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Xingyi Li, Zhiguo Cao, Yizheng Wu et al.

CVPR 2024arXiv:2311.09104

#3390

Cross-view and Cross-pose Completion for 3D Human Understanding

Matthieu Armando, Salma Galaaoui, Fabien Baradel et al.

#3391

CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering

Shaowei Wang, Lingling Zhang, Longji Zhu et al.

CVPR 2024arXiv:2404.07603

#3392

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Jihao Liu, Jinliang Zheng, Yu Liu et al.

CVPR 2024arXiv:2312.09925

#3393

CNC-Net: Self-Supervised Learning for CNC Machining Operations

Mohsen Yavartanoo, Sangmin Hong, Reyhaneh Neshatavar et al.

#3394

Flexible Depth Completion for Sparse and Varying Point Densities

Jinhyung Park, Yu-Jhe Li, Kris Kitani

CVPR 2024arXiv:2404.00312

#3395

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Yibo Miao, Yu lei, Feng Zhou et al.

#3396

3D-Aware Face Editing via Warping-Guided Latent Direction Learning

Yuhao Cheng, Zhuo Chen, Xingyu Ren et al.

#3397

CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images

Changsheng Chen, Liangwei Lin, Yongqi Chen et al.

CVPR 2024arXiv:2402.18786

#3398

OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

Yuchen Pan, Junjun Jiang, Kui Jiang et al.

CVPR 2024arXiv:2403.19904

#3399

Fully Geometric Panoramic Localization

Junho Kim, Jiwon Jeong, Young Min Kim

CVPR 2024arXiv:2404.00680

#3400

Learning to Rank Patches for Unbiased Image Redundancy Reduction

Yang Luo, Zhineng Chen, Peng Zhou et al.