Most Cited CVPR "gpu kernel design" Papers
5,589 papers found • Page 17 of 28
Conference
Panorama Generation From NFoV Image Done Right
Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.
Human Motion Prediction Under Unexpected Perturbation
Jiangbei Yue, Baiyi Li, Julien Pettré et al.
SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
Xueting Li, Ye Yuan, Shalini De Mello et al.
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain, Arun Suggala, Pradeep Shenoy
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.
LEAD: Exploring Logit Space Evolution for Model Selection
Zixuan Hu, Xiaotong Li, SHIXIANG TANG et al.
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation
Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.
Unsupervised Deep Unrolling Networks for Phase Unwrapping
Zhile Chen, Yuhui Quan, Hui Ji
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models
Hao Cheng, Erjia Xiao, Jiayan Yang et al.
It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Dominik Schnaus, Nikita Araslanov, Daniel Cremers
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
Ming Li, Jike Zhong, Tianle Chen et al.
Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
Ian Huang, Yanan Bao, Karen Truong et al.
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
Zimo Wang, Cheng Wang, Taiki Yoshino et al.
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
TAO MA, Bing Bai, Haozhe Lin et al.
LEDiff: Latent Exposure Diffusion for HDR Generation
Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
Markus Karmann, Onay Urfalioglu
Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization
Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman, Xiaoman Zhang, Emma Chen et al.
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
Haosen Yang, Adrian Bulat, Isma Hadji et al.
Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
Dominik Scheuble, Chenyang Lei, Mario Bijelic et al.
Fun with Flags: Robust Principal Directions via Flag Manifolds
Tolga Birdal, Nathan Mankovich
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.
G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy, Ismail Elezi, Jiankang Deng
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim et al.
Scene Map-based Prompt Tuning for Navigation Instruction Generation
Sheng Fan, Rui Liu, Wenguan Wang et al.
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
Song Xia, Yi Yu, Wenhan Yang et al.
Extreme Point Supervised Instance Segmentation
Hyeonjun Lee, Sehyun Hwang, Suha Kwak
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen, Bingbing Ni, Jinfan Liu et al.
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah
Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning
Yanbiao Ma, Wei Dai, Wenke Huang et al.
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
Zijin Yin, Kongming Liang, Bing Li et al.
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo, Conghao Xiong, Jiabo MA et al.
Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin et al.
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie, Alain Pagani, Didier Stricker
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns
Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention
Saad Wazir, Daeyoung Kim
Differentiable Point-based Inverse Rendering
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.
Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.
Efficient Stitchable Task Adaptation
Haoyu He, Zizheng Pan, Jing Liu et al.
PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization
Dongkyu Cho, Inwoo Hwang, Sanghack Lee
Parametric Point Cloud Completion for Polygonal Surface Reconstruction
Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.
FluxSpace: Disentangled Semantic Editing in Rectified Flow Models
Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé, Luca Versari, Emilien Dupont et al.
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.
4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians
Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.
Scene-Centric Unsupervised Panoptic Segmentation
Oliver Hahn, Christoph Reich, Nikita Araslanov et al.
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.
RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting
Qiyu Dai, Xingyu Ni, Qianfan Shen et al.
BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
Yuning Han, Bingyin Zhao, Rui Chu et al.
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
Augmented Deep Contexts for Spatially Embedded Video Coding
Yifan Bian, Chuanbo Tang, Li Li et al.
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
S P Sharan, Minkyu Choi, Sahil Shah et al.
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
Zhihao Shi, Dong Huo, Yuhongze Zhou et al.
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation
Edward LOO, Jiacheng Deng
Golden Cudgel Network for Real-Time Semantic Segmentation
Guoyu Yang, Yuan Wang, Daming Shi et al.
Logits DeConfusion with CLIP for Few-Shot Learning
Shuo Li, Fang Liu, Zehua Hao et al.
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su, Feixiang Song, CONG MA et al.
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
Kuang Wu, Chuan Yang, Zhanbin Li
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu, Wengen Li, Jihong Guan et al.
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Guangda Ji, Silvan Weder, Francis Engelmann et al.
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
Tomer Garber, Tom Tirer
Realistic Test-Time Adaptation of Vision-Language Models
Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
Yan Wang, Baoxiong Jia, Ziyu Zhu et al.
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh, Sung June Kim, Heeju Ko et al.
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye, Tomas Jakab, Shangzhe Wu et al.
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni, Sara Sarto, Marcella Cornia et al.
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho, Kim Youwang, Hunmin Yang et al.
Multitwine: Multi-Object Compositing with Text and Layout Control
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.
GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping
Jinfeng Liu, Lingtong Kong, Bo Li et al.
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
Hongda Liu, Longguang Wang, Ye Zhang et al.
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris et al.
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou, Tammy Riklin Raviv
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
Jingyu Lin, Jiaqi Gu, Lubin Fan et al.
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
Max Gutbrod, David Rauber, Danilo Weber Nunes et al.
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
Zhiwei Huang, Hailin Yu, Yichun Shentu et al.
Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB
Nikhil Behari, Aaron Young, Siddharth Somasundaram et al.
AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration
Jiong Lin, Lechen Zhang, Kwansoo Lee et al.
GIFStream: 4D Gaussian-based Immersive Video with Feature Stream
Hao Li, Sicheng Li, Xiang Gao et al.
Functionality Understanding and Segmentation in 3D Scenes
Jaime Corsetti, Francesco Giuliari, Alice Fasoli et al.
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Lingen Li, Zhaoyang Zhang, Yaowei Li et al.
Multi-View Pose-Agnostic Change Localization with Zero Labels
Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim et al.
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Tomas Soucek, Prajwal Gatti, Michael Wray et al.
Hearing Anywhere in Any Environment
Xiulong Liu, Anurag Kumar, Paul Calamia et al.
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin, Zhenbo Yu, Yang Shen et al.
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang, Shuting He, Xiang Fang et al.
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao, Yike Pan, Huan-ang Gao et al.
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
Kenghong Lin, Baoquan Zhang, Demin Yu et al.
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
Jaewoo Song, Daemin Park, Kanghyun Baek et al.
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin, Shizhao Sun, Danqing Huang et al.
CoMatcher: Multi-View Collaborative Feature Matching
Jintao Zhang, Zimin Xia, Mingyue Dong et al.
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen, Chengyu Wang, Dakan Wang et al.
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction
Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning
yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
On Denoising Walking Videos for Gait Recognition
Dongyang Jin, Chao Fan, Jingzhe Ma et al.
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
Quan Zhang, Jinwei Fang, Rui Yuan et al.
VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
Zhifeng Wang, Renjiao Yi, Xin Wen et al.
Towards Realistic Example-based Modeling via 3D Gaussian Stitching
Xinyu Gao, Ziyi Yang, Bingchen Gong et al.
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei, Chenyu Lin, Yu Qiu et al.
Towards In-the-wild 3D Plane Reconstruction from a Single Image
Jiachen Liu, Rui Yu, Sili Chen et al.
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection
Yunshuang Yuan, Yan Xia, Daniel Cremers et al.
KAC: Kolmogorov-Arnold Classifier for Continual Learning
Yusong Hu, Zichen Liang, Fei Yang et al.
Dense-SfM: Structure from Motion with Dense Consistent Matching
JongMin Lee, Sungjoo Yoo
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection
Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.
Learning from Streaming Video with Orthogonal Gradients
Tengda Han, Dilara Gokay, Joseph Heyward et al.
Motion Modes: What Could Happen Next?
Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.
Rotation-Equivariant Self-Supervised Method in Image Denoising
Hanze Liu, Jiahong Fu, Qi Xie et al.
ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, yixuan liu, Takashi Isobe et al.
Improving Gaussian Splatting with Localized Points Management
Haosen Yang, Chenhao Zhang, Wenqing Wang et al.
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
Darshana Saravanan, Varun Gupta, Darshan Singh S et al.
CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
Xiaoding Yuan, Shitao Tang, Kejie Li et al.
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue, Yulin Wang, Haojun Jiang et al.
Uncertain Multimodal Intention and Emotion Understanding in the Wild
Qu Yang, QingHongYa Shi, Tongxin Wang et al.
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi, Ali Nazari, Aminreza Sefid et al.
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping
Guannan Lai, Yujie Li, Xiangkun Wang et al.
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model
Jian Zhu, He Wang, Yang Xu et al.
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
Yueru Jia, Jiaming Liu, Sixiang Chen et al.
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models
Shuyang Hao, Bryan Hooi, Jun Liu et al.
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
Zhengyuan Li, Kai Cheng, Anindita Ghosh et al.
One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception
Yuchen Xia, Quan Yuan, Guiyang Luo et al.
Denoising Functional Maps: Diffusion Models for Shape Correspondence
Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
Huangbiao Xu, Xiao Ke, Huanqi Wu et al.
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang, DUO PENG, Feng Chen et al.
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
Haipeng Fang, Sheng Tang, Juan Cao et al.
Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution
Huan Zheng, Wencheng Han, Jianbing Shen
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
ruotian peng, Haiying He, Yake Wei et al.
Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Yongqi Huang, Peng Ye, Chenyu Huang et al.
TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation
Ruineng Li, Daitao Xing, Huiming Sun et al.
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
Ruihai Wu, Ziyu Zhu, Yuran Wang et al.
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Ziyi Wang, Yanran Zhang, Jie Zhou et al.
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking
Xinqi Liu, Li Zhou, Zikun Zhou et al.
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Weinan Jia, Mengqi Huang, Nan Chen et al.
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
Siwei Tu, Ben Fei, Weidong Yang et al.
Unified Dense Prediction of Video Diffusion
Lehan Yang, Lu Qi, Xiangtai Li et al.
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
Andrew Z Wang, Songwei Ge, Tero Karras et al.
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
Yiyang Du, Xiaochen Wang, Chi Chen et al.
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
ZaiPeng Duan, Xuzhong Hu, Pei An et al.
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
Navigating Image Restoration with VAR’s Distribution Alignment Prior
Siyang Wang, Naishan Zheng, Jie Huang et al.
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai, Dilin Wang, Mihir Jain et al.
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
ruojun xu, Weijie Xi, Xiaodi Wang et al.
Generative Sparse-View Gaussian Splatting
Hanyang Kong, Xingyi Yang, Xinchao Wang
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen, Keyi Li, Yifan Jia et al.
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Haolin Liu, Xiaohang Zhan, Zizheng Yan et al.
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
Aayush Dhakal, Srikumar Sastry, Subash Khanal et al.
Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers
Yichen Xiao, Shuai Wang, Dehao Zhang et al.
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang, Rishabh Dabral, Diogo Luvizon et al.
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
Zidong Cao, Jinjing Zhu, Weiming Zhang et al.
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.
Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection
wenbing zhu, Lidong Wang, Ziqing Zhou et al.
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
Keyframe-Guided Creative Video Inpainting
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
3D-MVP: 3D Multiview Pretraining for Manipulation
Shengyi Qian, Kaichun Mo, Valts Blukis et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.
U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening
Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment
Yang Bai, Yucheng Ji, Min Cao et al.
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal, Yushi Hu, Oscar Michel et al.
Unsupervised Template-assisted Point Cloud Shape Correspondence Network
Jiacheng Deng, Jiahao Lu, Tianzhu Zhang
In-Context Matting
He Guo, Zixuan Ye, Zhiguo Cao et al.
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li, Zhiguo Cao, Yizheng Wu et al.
Cross-view and Cross-pose Completion for 3D Human Understanding
Matthieu Armando, Salma Galaaoui, Fabien Baradel et al.
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering
Shaowei Wang, Lingling Zhang, Longji Zhu et al.
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu, Jinliang Zheng, Yu Liu et al.
CNC-Net: Self-Supervised Learning for CNC Machining Operations
Mohsen Yavartanoo, Sangmin Hong, Reyhaneh Neshatavar et al.
Flexible Depth Completion for Sparse and Varying Point Densities
Jinhyung Park, Yu-Jhe Li, Kris Kitani
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao, Yu lei, Feng Zhou et al.
3D-Aware Face Editing via Warping-Guided Latent Direction Learning
Yuhao Cheng, Zhuo Chen, Xingyu Ren et al.
CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images
Changsheng Chen, Liangwei Lin, Yongqi Chen et al.
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
Yuchen Pan, Junjun Jiang, Kui Jiang et al.
Fully Geometric Panoramic Localization
Junho Kim, Jiwon Jeong, Young Min Kim
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo, Zhineng Chen, Peng Zhou et al.