Most Cited 2025 "weak learner optimization" Papers
22,274 papers found • Page 40 of 112
Conference
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
Xiang Gao, Shuai Yang, Jiaying Liu
AniMo: Species-Aware Model for Text-Driven Animal Motion Generation
Xuan Wang, Kai Ruan, Xing Zhang et al.
Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training
Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim et al.
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao, Pranav Virupaksha, Wenqi Jia et al.
MARBLE: Material Recomposition and Blending in CLIP-Space
Ta-Ying Cheng, Prafull Sharma, Mark Boss et al.
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.
Fractal Calibration for Long-tailed Object Detection
Konstantinos Alexandridis, Ismail Elezi, Jiankang Deng et al.
Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization
Peirong Liu, Ana Lawry Aguila, Juan Iglesias
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Jihoon Kim, Jeongsoo Choi, Jaehun Kim et al.
Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection
Houzhang Fang, Xiaolin Wang, Zengyang Li et al.
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views
Antoine Guédon, Tomoki Ichikawa, Kohei Yamashita et al.
Omnidirectional Multi-Object Tracking
Kai Luo, Hao Shi, Sheng Wu et al.
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion
Kai He, Chin-Hsuan Wu, Igor Gilitschenski
Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
Jeeyung Kim, Erfan Esmaeili Fakhabi, Qiang Qiu
NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary
Zezeng Li, Xiaoyu Du, Na Lei et al.
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen, Guoqiang Gong, Tianxiang Hao et al.
Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning
Chenjie Hao, Weyl Lu, Yifan Xu et al.
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen, Junhao Dong, Xiaohua Xie
Conformal Prediction for Zero-Shot Models
Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang, June Suk Choi, Jaehyeong Jo et al.
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
Hao Zhu, Yan Zhu, Jiayu Xiao et al.
PerLA: Perceptive 3D Language Assistant
Guofeng Mei, Wei Lin, Luigi Riz et al.
Improving Transferable Targeted Attacks with Feature Tuning Mixup
Kaisheng Liang, Xuelong Dai, Yanjie Li et al.
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction
Miaowei Wang, Yibo Zhang, Rui Ma et al.
MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images
Aniruddha Ganguly, Debolina Chatterjee, Wentao Huang et al.
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou, Hui Ren, Yijia Weng et al.
OpenSDI: Spotting Diffusion-Generated Images in the Open World
Yabin Wang, Zhiwu Huang, Xiaopeng Hong
Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
Xingyu Qiu, Mengying Yang, Xinghua Ma et al.
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
Phuc Nguyen, Minh Luu, Anh Tran et al.
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang, Liang Li, Chenggang Yan et al.
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang, Yeda Song, Sungryull Sohn et al.
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
Yunpeng Qu, Kun Yuan, Qizhi Xie et al.
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
Boyong He, Yuxiang Ji, Qianwen Ye et al.
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Junying Wang, Jingyuan Liu, Xin Sun et al.
Extreme Rotation Estimation in the Wild
Hana Bezalel, Dotan Ankri, Ruojin Cai et al.
Exploration-Driven Generative Interactive Environments
Nedko Savov, Naser Kazemi, Mohammad Mahdi et al.
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Bangyan Liao, Zhenjun Zhao, Haoang Li et al.
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss
Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
Junho Kim, Hyunjun Kim, Hosu Lee et al.
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall et al.
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
Yuheng Xu, Shijie Yang, Xin Liu et al.
ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On
Ji Woo Hong, Tri Ton, Trung X. Pham et al.
AIpparel: A Multimodal Foundation Model for Digital Garments
Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan et al.
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
Yian Zhao, Wanshi Xu, Ruochong Zheng et al.
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng et al.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis
Yousef Yeganeh, Ioannis Charisiadis, Marta Hasny et al.
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
Sohan Patnaik, Rishabh Jain, Balaji Krishnamurthy et al.
Learning to Highlight Audio by Watching Movies
Chao Huang, Ruohan Gao, J. M. F. Tsang et al.
MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond
Shenghao Ren, Yi Lu, Jiayi Huang et al.
On the Consistency of Video Large Language Models in Temporal Comprehension
Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang et al.
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman, Haiwen Feng, Michael J. Black et al.
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
Yihao Wang, Marcus Klasson, Matias Turkulainen et al.
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model
Mingtao Guo, Guanyu Xing, Yanli Liu
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Can Zhang, Gim Hee Lee
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang et al.
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
Liang Liu, Shuaiyong Li, Yongqiang Zhu
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
Yang Wu, Yun Zhu, Kaihua Zhang et al.
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
Bikang Pan, Qun Li, Xiaoying Tang et al.
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
Yaniv Benny, Lior Wolf
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion
Yiran Wang, Jiaqi Li, Chaoyi Hong et al.
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
Xianwei Zhuang, Zhihong Zhu, Yuxin Xie et al.
URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration
Rui Xu, Yuzhen Niu, Yuezhou Li et al.
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance
Mushui Liu, Dong She, Qihan Huang et al.
Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers
Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon et al.
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang et al.
Removing Reflections from RAW Photos
Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen et al.
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya et al.
Multi-party Collaborative Attention Control for Image Customization
Han Yang, Chuanguang Yang, Qiuli Wang et al.
Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
Chong Bao, Xiyu Zhang, Zehao Yu et al.
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han, Xiangzuo Wu, Huan Liao et al.
Minority-Focused Text-to-Image Generation via Prompt Optimization
Soobin Um, Jong Chul Ye
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
Kai Fang, Anqi Zhang, Guangyu Gao et al.
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization
lingyun zhang, Yu Xie, Yanwei Fu et al.
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
Yihang Luo, Shangchen Zhou, Yushi Lan et al.
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
Baoshun Tong, Hanjiang Lai, Yan Pan et al.
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
Yanfeng Li, Ka-Hou Chan, Yue Sun et al.
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang, Changxing Ding, Wentao Tan et al.
LUCAS: Layered Universal Codec Avatars
Di Liu, Teng Deng, Giljoo Nam et al.
RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images
Junjin Xiao, Qing Zhang, Yongwei Nie et al.
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li, Lingyun Xu, Mingxu Zhang et al.
Learning Visual Generative Priors without Text
Shuailei Ma, Kecheng Zheng, Ying Wei et al.
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.
Binarized Neural Network for Multi-spectral Image Fusion
Junming Hou, Xiaoyu Chen, Ran Ran et al.
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla, Sethuraman T V, Alexander G. Schwing et al.
Anomize: Better Open Vocabulary Video Anomaly Detection
Fei Li, Wenxuan Liu, Jingjing Chen et al.
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety
Ronghang Zhu, Mengxuan Hu, Weiming Zhuang et al.
UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts
Yidi Liu, Dong Li, Xueyang Fu et al.
Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction
Dong Li, Wenqi Zhong, Wei Yu et al.
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Jinxi Li, Ziyang Song, Siyuan Zhou et al.
Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators
Bohan Xiao, PEIYONG WANG, Qisheng He et al.
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Jun Chen, Dannong Xu, Junjie Fei et al.
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
Shuo Wang, Wanting Li, Yongcai Wang et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes
Patrick Rim, Hyoungseob Park, Suchisrit Gangopadhyay et al.
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging
Ping Wang, Lishun Wang, Gang Qu et al.
CGMatch: A Different Perspective of Semi-supervised Learning
Bo Cheng, Jueqing Lu, Yuan Tian et al.
A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition
Duosheng Chen, Shihao Zhou, Jinshan Pan et al.
D^3-Human: Dynamic Disentangled Digital Human from Monocular Video
Honghu Chen, Bo Peng, Yunfan Tao et al.
Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising
Yuchen Wang, Hongyuan Wang, Lizhi Wang et al.
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
YUEJIAO SU, Yi Wang, Qiongyang Hu et al.
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
Xuan Zhu, Jijun Xiang, Xianqi Wang et al.
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Feng Wang, Timing Yang, Yaodong Yu et al.
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
Shu Yang, Chengting Yu, Lei Liu et al.
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Tianyun Zhong, Chao Liang, Jianwen Jiang et al.
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang, Shaobin Zhuang, Canmiao Fu et al.
HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration
Shaocheng Yan, Yiming Wang, Kaiyan Zhao et al.
Boosting Adversarial Transferability through Augmentation in Hypothesis Space
Yu Guo, Weiquan Liu, Qingshan Xu et al.
EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds
Yujun Liu, Ruisheng Wang, Shangfeng Huang et al.
NeISF++: Neural Incident Stokes Field for Polarized Inverse Rendering of Conductors and Dielectrics
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression
Hsiang-Wei Huang, Fu-Chen Chen, Wenhao Chai et al.
Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions
Ting-Hsuan Liao, Yi Zhou, Yu Shen et al.
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
Jiangtong Tan, Hu Yu, Jie Huang et al.
Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
Wei Lin, Chenyang ZHAO, Antoni B. Chan
Move-in-2D: 2D-Conditioned Human Motion Generation
Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang et al.
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
Zhaohu Xing, Lihao Liu, Yijun Yang et al.
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
Hanbin Ko, Chang Min Park
Learning from Neighbors: Category Extrapolation for Long-Tail Learning
Shizhen Zhao, Xin Wen, Jiahui Liu et al.
LongDiff: Training-Free Long Video Generation in One Go
Zhuoling Li, Hossein Rahmani, Qiuhong Ke et al.
Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
Xianing Chen, Si Huo, Borui Jiang et al.
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki, Anabia Sohail, IYYAKUTTI IYAPPAN GANAPATHI et al.
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization
Zhipeng Xu, De Cheng, XINYANG JIANG et al.
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
Yifan Wang, Jian Zhao, Zhaoxin Fan et al.
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli et al.
Open-World Objectness Modeling Unifies Novel Object Detection
Shan Zhang, Yao Ni, Jinhao Du et al.
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
Mu Chen, Liulei Li, Wenguan Wang et al.
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
ZhiFei Chen, Tianshuo Xu, Wenhang Ge et al.
Zero-Shot 4D Lidar Panoptic Segmentation
Yushan Zhang, Aljoša Ošep, Laura Leal-Taixe et al.
HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories
Eric Hedlin, Munawar Hayat, Fatih Porikli et al.
ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
Yu-Cheng Chiu, GUAN-RONG CHEN, Zihao Chen et al.
NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation
Qi Bi, Jingjun Yi, Huimin Huang et al.
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu, Jiangning Zhang, Ran Yi et al.
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert et al.
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim et al.
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao, Shiqian Su, Xizhou Zhu et al.
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
Jiashuo Li, Shaokun Wang, Bo Qian et al.
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
Huabin Liu, Filip Ilievski, Cees G. M. Snoek
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation
Tianfu Wang, Mingyang Xie, Haoming Cai et al.
Novel View Synthesis with Pixel-Space Diffusion Models
Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman et al.
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei, Tao Chen, Yujia Wang et al.
EchoONE: Segmenting Multiple Echocardiography Planes in One Model
Jiongtong Hu, Wei Zhuo, Jun Cheng et al.
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
Hongxu chen, Zhen Wang, Runshi Li et al.
Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Woojung Han, Yeonkyung Lee, Chanyoung Kim et al.
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee, Jijun jiang, Yiran Xu et al.
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
Sung Jin Um, Dongjin Kim, Sangmin Lee et al.
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
Zirun Guo, Tao Jin
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Xuewu Lin, Tianwei Lin, Alan Huang et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.
FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing
Yufan Ren, Zicong Jiang, Tong Zhang et al.
Dynamic Stereotype Theory Induced Micro-expression Recognition with Oriented Deformation
Bohao Zhang, Xuejiao Wang, Changbo Wang et al.
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
Juil Koo, Paul Guerrero, Chun-Hao P. Huang et al.
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun et al.
ProbeSDF: Light Field Probes For Neural Surface Reconstruction
Briac Toussaint, Diego Thomas, Jean-Sébastien Franco
DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging
Zhu Liu, Zijun Wang, Jinyuan Liu et al.
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining
Shangquan Sun, Wenqi Ren, Juxiang Zhou et al.
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
Yongli Xiang, Ziming Hong, Lina Yao et al.
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild
Wei Liu, Yufei Chen, Xiaodong Yue
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
Songpengcheng Xia, Yu Zhang, Zhuo Su et al.
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang, Yixuan Li, yanhong zeng et al.
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation
Jimyeong Kim, Jungwon Park, Yeji Song et al.
Auto-Regressively Generating Multi-View Consistent Images
JiaKui Hu, Yuxiao Yang, Jialun Liu et al.
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.
External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model
Junjia Huang, Pengxiang Yan, Jinhang Cai et al.
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
Han-Hung Lee, Qinghong Han, Angel Chang
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes
Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.
Information Density Principle for MLLM Benchmarks
Chunyi Li, Xiaozhe Li, Zicheng Zhang et al.
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li et al.
HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction
Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.
Adversarial Robust Memory-Based Continual Learner
Xiaoyue Mi, Fan Tang, Zonghan Yang et al.
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning
yan wang, Da-Wei Zhou, Han-Jia Ye
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Björn Braun, Rayan Armani, Manuel Meier et al.
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
SHIBO WANG, Haonan He, Maria Parelli et al.
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation
Cui Miao, Tao Chang, meihan wu et al.
C4D: 4D Made from 3D through Dual Correspondences
Shizun Wang, Zhenxiang Jiang, Xingyi Yang et al.
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo, Friedhelm Hamann, Guillermo Gallego
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li, Zeyu Cai, Michael Black et al.
DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
Zhiqiang Yan, Zhengxue Wang, Haoye Dong et al.
Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Yuru Jia, Valerio Marsocci, Ziyang Gong et al.
4D Visual Pre-training for Robot Learning
Chengkai Hou, Yanjie Ze, Yankai Fu et al.
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
Peng-Hao Hsu, Ke Zhang, Fu-En Wang et al.
GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts
Minwen Liao, Hao Dong, Xinyi Wang et al.
Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures
Tim Seizinger, Florin-Alexandru Vasluianu, Marcos Conde et al.