Most Cited 2025 "conditional diffusion transformer" Papers
22,274 papers found • Page 86 of 112
Conference
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang, Shuang Yang, Shiguang Shan et al.
End-to-End Entity-Predicate Association Reasoning for Dynamic Scene Graph Generation
LiWei Wang, YanDuo Zhang, Tao Lu et al.
AnomalyCoT: A Multi-Scenario Chain-of-Thought Dataset for Multimodal Large Language Models
Jiaxi Cheng, Yuliang Xu, Shoupeng Wang et al.
Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai et al.
Debiased Curriculum Adaptation for Safe Transfer Learning in Chest X-ray Classification
Mingyang Liu, Xinyang Chen, Yang Shu et al.
InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation
Zhuoran Yang, Xi Guo, Chenjing Ding et al.
GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking
Hyunseop Kim, Hyo-Jun Lee, Yonguk Lee et al.
Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging
Bo Wang, Dingwei Tan, Yen-Ling Kuo et al.
Rethinking DPO-style Diffusion Aligning Frameworks
XUN WU, Shaohan Huang, Lingjie Jiang et al.
NormalLoc: Visual Localization on Textureless 3D Models using Surface Normals
Jiro Abe, Gaku Nakano, Kazumine Ogura
MMD-Regularized Unbalanced Optimal Transport
SakethaNath Jagarlapudi, Pratik Jawanpuria, Piyushi Manupriya
SPD: Shallow Backdoor Protecting Deep Backdoor Against Backdoor Detection
Shunjie Yuan, Xinghua Li, Xuelin Cao et al.
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception
yunjiang xu, Yupeng Ouyang, Lingzhi Li et al.
Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features
Wenhuan Huang, Yi JI, guiqian zhu et al.
Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders
Wang Lin, Qingsong Wang, Yueying Feng et al.
Hunyuan-Portrait: Implicit Condition Control for Enhanced Portrait Animation
Zunnan Xu, Zhentao Yu, Zixiang Zhou et al.
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
Mert Bülent Sarıyıldız, Philippe Weinzaepfel, Thomas Lucas et al.
Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures
Xinlong Ding, Hongwei Yu, Jiawei Li et al.
Task-Aware Clustering for Prompting Vision-Language Models
Fusheng Hao, Fengxiang He, Fuxiang Wu et al.
NGD: Neural Gradient Based Deformation for Monocular Garment Reconstruction
Soham Dasgupta, Shanthika Naik, Preet Savalia et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales
Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram et al.
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
Hairui Ren, Fan Tang, He Zhao et al.
Activating Sparse Part Concepts for 3D Class Incremental Learning
Zhenya Tian, Jun Xiao, Liu lupeng et al.
Lifting the Structural Morphing for Wide-Angle Images Rectification: Unified Content and Boundary Modeling
Wenting Luan, Siqi Lu, Yongbin Zheng et al.
Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing
Xuanbai Chen, Xiang Xu, Zhihua Li et al.
PS-EIP: Robust Photometric Stereo Based on Event Interval Profile
Kazuma Kitazawa, Takahito Aoto, Satoshi Ikehata et al.
ROAR: Reducing Inversion Error in Generative Image Watermarking
Hanyi Wang, Han Fang, Shi-Lin Wang et al.
Repurposing in AI: A Distinct Approach or an Extension of Creative Problem Solving?
Aissatou Diallo, Antonis Bikakis, Luke Dickens et al.
Three-view Focal Length Recovery From Homographies
Yaqing Ding, Viktor Kocur, Zuzana Berger Haladova et al.
Bayesian-Inspired Space-Time Superpixels
Kent Gauen, Stanley Chan
Learning Normals of Noisy Points by Local Gradient-Aware Surface Filtering
Qing Li, Huifang Feng, Xun Gong et al.
Free2Guide: Training-Free Text-to-Video Alignment using Image LVLM
Jaemin Kim, Bryan Sangwoo Kim, Jong Ye
Visual Representation Learning through Causal Intervention for Controllable Image Editing
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
Dynamic Content Prediction with Motion-aware Priors for Blind Face Video Restoration
Lianxin Xie, csbingbing zheng, Si Wu et al.
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du, Hui Li, Han Xu et al.
When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection
Hongliang Zhou, Yongxiang Liu, Canyu Mo et al.
LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning
Jiang Yuan, ji ma, Bo Wang et al.
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Shenxing Wei, Jinxi Li, Yafei YANG et al.
Semantic-guided Camera Ray Regression for Visual Localization
Yesheng Zhang, Xu Zhao
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
Sandeep Silwal, David Woodruff, Qiuyi (Richard) Zhang
Polarimetric Neural Field via Unified Complex-Valued Wave Representation
Chu Zhou, Yixin Yang, Junda Liao et al.
High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach
Yuchong Chen, Jian Yu, Shaoyan Gai et al.
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
Yiyuan Zhang, Handong Li, Jing Liu et al.
Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry
Matan Kichler, Shai Bagon, Mark Sheinin
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han et al.
Adversarial Training for Probabilistic Robustness
YI ZHANG, Yuhang Chen, Zhen Chen et al.
CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
Jiaqi Han, Haotian Ye, Puheng Li et al.
Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable
Chenxiao Yang, Zhiyuan Li, David Wipf
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
Kaiyue Wen, Zhiyuan Li, Jason Wang et al.
Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy
Aditya Ganeshan, Thibault Groueix, Paul Guerrero et al.
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
PhysPDE: Rethinking PDE Discovery and a Physical HYpothesis Selection Benchmark
Mingquan Feng, Yixin Huang, Yizhou Liu et al.
Diagnosing Pretrained Models for Out-of-distribution Detection
Haipeng Xiong, Kai Xu, Angela Yao
HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity
Yida Wang, Xueyang Zhang, Kun Zhan et al.
CoralSRT: Revisiting Coral Reef Semantic Segmentation by Feature Rectifying via Self-supervised Guidance
Zheng Ziqiang, Wong Kwan, Binh-Son Hua et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
Zhenghao Xing, Hao Chen, Binzhu Xie et al.
Harnessing Global-Local Collaborative Adversarial Perturbation for Anti-Customization
Long Xu, Jiakai Wang, Haojie Hao et al.
PVMamba: Parallelizing Vision Mamba via Dynamic State Aggregation
Fei Xie, Zhongdao Wang, Weijia Zhang et al.
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater
Xueyu Liu, Rui Wang, Yexin Lai et al.
FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
Wenbin Teng, Gonglin Chen, Haiwei Chen et al.
TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice
Shen Yan, Xingyan Bin, Sijun Zhang et al.
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
Motion-2-to-3: Leveraging 2D Motion Data for 3D Motion Generations
Ruoxi Guo, Huaijin Pi, Zehong Shen et al.
I2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
Zhimin Liao, Ping Wei, Ruijie Zhang et al.
InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation
Jungmin Lee, Seonghyuk Hong, Juyong Lee et al.
RIOcc: Efficient Cross-Modal Fusion Transformer with Collaborative Feature Refinement for 3D Semantic Occupancy Prediction
Baojie Fan, Xiaotian Li, Yuhan Zhou et al.
A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions
Qiang Li, Jian Ruan, Fanghao Wu et al.
Geometric Alignment and Prior Modulation for View-Guided Point Cloud Completion on Unseen Categories
Jingqiao Xiu, Yicong Li, Na Zhao et al.
Do vision models perceive objects like toddlers ?
Arthur Aubret, Jochen Triesch
Open Set Label Shift with Test Time Out-of-Distribution Reference
Changkun Ye, Russell Tsuchida, Lars Petersson et al.
DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models
SeungHoo Hong, GeonHo Son, Juhun Lee et al.
Biologically Constrained Barrel Cortex Model Integrates Whisker Inputs and Replicates Key Brain Network Dynamics
Tianfang Zhu, Dongli Hu, Jiandong Zhou et al.
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
Ishank Juneja, Carlee Joe-Wong, Osman Yagan
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation
Jingyi Tian, Le Wang, Sanping Zhou et al.
Incomplete Multi-View Multi-label Learning via Disentangled Representation and Label Semantic Embedding
Xu Yan, Jun Yin, Jie Wen
CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition
Xuli Shen, Hua Cai, Weilin Shen et al.
Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection
Ziqi Li, Tao Gao, Yisheng An et al.
PointSR: Self-Regularized Point Supervision for Drone-View Object Detection
Weizhuo Li, Yue Xi, Wenjing Jia et al.
Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning
Menglong Zhang, Fuyuan Qian, Quanying Liu
CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction
Yunfei Teng, Yuxuan Ren, Kai Chen et al.
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Federico Girella, Davide Talon, Ziyue Liu et al.
Scoring, Remember, and Reference: Catching Camouflaged Objects in Videos
Yuang Feng, Shuyong Gao, Fuzhen Yan et al.
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu, Susang Kim, Kisu Lee et al.
KAN: Kolmogorov–Arnold Networks
Ziming Liu, Yixuan Wang, Sachin Vaidya et al.
GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing
Tianyang Xue, Lin Lu, Yang Liu et al.
Online Clustering with Nearly Optimal Consistency
T-H. Hubert Chan, Shaofeng Jiang, Tianyi Wu et al.
Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination
Chao Pan, Ke Tang, Li Qing et al.
MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy
Wuyang Li, Wentao Pan, Xiaoyuan Liu et al.
Dual-Rate Dynamic Teacher for Source-Free Domain Adaptive Object Detection
Qi He, Xiao Wu, Jun-Yan He et al.
Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering
Biplab Das, Viswanath Gopalakrishnan
Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection
Jinhyung Park, Navyata Sanghvi, Hiroki Adachi et al.
Interpretable point cloud classification using multiple instance learning
Matt De Vries, Reed Naidoo, Olga Fourkioti et al.
Mitigating Geometric Degradation in Fast DownSampling via FastAdapter for Point Cloud Segmentation
Shuofeng Sun, Haibin Yan
TryOn-Refiner: Conditional Rectified-flow-based TryOn Refiner for More Accurate Detail Reconstruction
Wen Qian
Regularized Proportional Fairness Mechanism for Resource Allocation Without Money
Sujay Bhatt, Alec Koppel, Sumitra Ganesh et al.
Dynamic Neural Fortresses: An Adaptive Shield for Model Extraction Defense
Siyu Luan, Zhenyi Wang, Li Shen et al.
Compositional Targeted Multi-Label Universal Perturbations
Hassan Mahmood, Ehsan Elhamifar
Protein Language Model Fitness is a Matter of Preference
Cade Gordon, Amy Lu, Pieter Abbeel
ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-supervised Learning for Virtual Immunohistochemistry Staining
Tong Wang, Mingkang Wang, Zhongze Wang et al.
Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy
Wang, Zongqing Lu
Learning and aligning single-neuron invariance manifolds in visual cortex
Mohammad Bashiri, Luca Baroni, Ján Antolík et al.
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.
ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network
Zhuochen Yu, Bijie Qiu, Andy W. H. Khong
SEHDR: Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing
Yiyu Li, Haoyuan Wang, Ke Xu et al.
Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Jiafei Lyu, Mengbei Yan, Zhongjian Qiao et al.
High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
Runyang Feng, Hyung Jin Chang, Tze Ho Elden Tse et al.
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization
Kai Mao, Ping Wei, Yiyang Lian et al.
C2MIL: Synchronizing Semantic and Topological Causalities in Multiple Instance Learning for Robust and Interpretable Survival Analysis
Min Cen, Zhenfeng Zhuang, Yuzhe Zhang et al.
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng et al.
TARS: Traffic-Aware Radar Scene Flow Estimation
Jialong Wu, Marco Braun, Dominic Spata et al.
EYE3:Turn Anything into Naked-eye 3D
Yingde Song, Zongyuan Yang, Baolin Liu et al.
TAGA: Self-supervised Learning for Template-free Animatable Gaussian Articulated Model
Zhichao Zhai, Guikun Chen, Wenguan Wang et al.
Lost in Prediction: Why Social Media Narratives Don't Help Macroeconomic Forecasting?
Almog Gueta, Roi Reichart, Amir Feder et al.
All-Day Multi-Camera Multi-Target Tracking
Huijie Fan, Yu Qiao, Yihao Zhen et al.
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding
Wenbo Chen, Zhen Xu, Ruotao Xu et al.
Conditional Visual Autoregressive Modeling for Pathological Image Restoration
Ziyi Liu, Zhe Xu, Jiabo MA et al.
A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds
Jizong Peng, Tze Ho Elden Tse, Kai Xu et al.
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
Leaps and Bounds: An Improved Point Cloud Winding Number Formulation for Fast Normal Estimation and Surface Reconstruction
Chamin Hewa Koneputugodage, Dylan Campbell, Stephen Gould
Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction
Ning Ni, Libao Zhang
ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution
Zeyu Mi, Yu-Bin Yang
LOIRE: LifelOng learning on Incremental data via pre-trained language model gRowth Efficiently
Xue Han, Yitong Wang, Junlan Feng et al.
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds
Jinfeng Xu, Xianzhi Li, Yuan Tang et al.
A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields
Aoxiang Fan, Corentin Dumery, Nicolas Talabot et al.
MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities
Federico Lincetto, Gianluca Agresti, Mattia Rossi et al.
EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding
Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising
Xiaoling Zhou, Zhemg Lee, Wei Ye et al.
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
Darryl Ho, Samuel Madden
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
Qiang Zhang, Mengsheng Zhao, Jiawei Liu et al.
Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning
Yiyang Chen, Shanshan Zhao, Lunhao Duan et al.
A Focused Human Body Model for Accurate Anthropometric Measurements Extraction
Shuhang Chen, Xianliang Huang, Zhizhou Zhong et al.
Optimality of Matrix Mechanism on $\ell_p^p$-metric
Zongrui Zou, Jingcheng Liu, Jalaj Upadhyay
OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving
Kota Shimomura, Masaki Nambata, Atsuya Ishikawa et al.
Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
shangwen zhu, Han Zhang, Zhantao Yang et al.
Exploring Timeline Control for Facial Motion Generation
Yifeng Ma, Jinwei Qi, Chaonan Ji et al.
Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation
Fengchen He, Dayang Zhao, Hao Xu et al.
Aligning Global Semantics and Local Textures in Generative Video Enhancement
Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.
Completing 3D Partial Assemblies with View-Consistent 2D-3D Correspondence
Weihao Wang, Yu Lan, Mingyu You et al.
MDP-Omni: Parameter-free Multimodal Depth Prior-based Sampling for Omnidirectional Stereo Matching
Eunjin Son, HyungGi Jo, Wookyong Kwon et al.
Text-to-Any-Skeleton Motion Generation Without Retargeting
Qingyuan Liu, Ke Lv, Kun Dong et al.
STEP-DETR: Advancing DETR-based Semi-Supervised Object Detection with Super Teacher and Pseudo-Label Guided Text Queries
Tahira Shehzadi, Khurram Azeem Hashmi, Shalini Sarode et al.
Bad-PFL: Exploiting Backdoor Attacks against Personalized Federated Learning
Mingyuan Fan, Zhanyi Hu, Fuyi Wang et al.
KDA: Knowledge Diffusion Alignment with Enhanced Context for Video Temporal Grounding
Ran Ran, Jiwei Wei, Shiyuan He et al.
Be More Specific: Evaluating Object-centric Realism in Synthetic Images
Anqi Liang, Ciprian Adrian Corneanu, Qianli Feng et al.
GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization under Large Viewpoint Changes
Yunxuan Li, Lei Fan, Xiaoying Xing et al.
EDM: Efficient Deep Feature Matching
Xi Li, Tong Rao, Cihui Pan
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang, Yupeng Zhou, Kai Wang et al.
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
Shuhang Chen, Hangjie Yuan, Pengwei Liu et al.
Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos
Vadim Tschernezki, Diane Larlus, Andrea Vedaldi et al.
Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features
Daeho Um, Yoonji Lee, Jiwoong Park et al.
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
An Illustrated Guide to Automatic Sparse Differentiation
Adrian Hill, Guillaume Dalle, Alexis Montoison
Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception
Baixuan Lv, Yaohua Zha, Tao Dai et al.
EMoTive: Event-guided Trajectory Modeling for 3D Motion Estimation
Zengyu Wan, Wei Zhai, Yang Cao et al.
Task-Decoupled Bézier Surface Constraint for Uneven Low-Light Image Enhancement
Xingxiang Zhou, Xiangdong Su, Haoran Zhang et al.
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
Saad Lahlali, Sandra Kara, Hejer AMMAR et al.
3D Test-time Adaptation via Graph Spectral Driven Point Shift
Xin Wei, Qin Yang, Yijie Fang et al.
TOTP: Transferable Online Pedestrian Trajectory Prediction with Temporal-Adaptive Mamba Latent Diffusion
Ziyang Ren, Ping Wei, Shangqi Deng et al.
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
Long Ma, Tengyu Ma, Ziye Li et al.
Wave-MambaAD: Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection
Qiao Zhang, Mingwen Shao, Xinyuan Chen et al.
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation
Yuheng Feng, Changsong Wen, Zelin Peng et al.
Semantic Discrepancy-aware Detector for Image Forgery Identification
Wang Ziye, Minghang Yu, Chunyan Xu et al.
GeoAvatar: Geometrically-Consistent Multi-Person Avatar Reconstruction from Sparse Multi-View Videos
Soohyun Lee, SeoYeon Kim, HeeKyung Lee et al.
Spatially-Varying Autofocus
Yingsi Qin, Aswin Sankaranarayanan, Matthew O'Toole
Robust System Identification: Finite-sample Guarantees and Connection to Regularization
Hank Park, Grani A. Hanasusanto, Yingying Li
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy
Zesen Cheng, Hang Zhang, Kehan Li et al.
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei, Hang Wang, Bingbing Ni
Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond
Xin Qiao, Matteo Poggi, Xing Wei et al.
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
Yijun Liang, Shweta Bhardwaj, Tianyi Zhou
Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning
Shouhang Zhu, Chenglin Li, Yuankun Jiang et al.
Font-Agent: Enhancing Font Understanding with Large Language Models
Yingxin Lai, Cuijie Xu, Haitian Shi et al.
Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-training Approach for RGBD Datasets
Muhammad Abdullah Jamal, Omid Mohareri
HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison
Yung-Hao Yang, Zitang Sun, Taiki Fukiage et al.
STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation
Yisi Luo, Xile Zhao, Kai Ye et al.
3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping
Chenhui Shi, Fulin Tang, Ning An et al.
Intricacies of Feature Geometry in Large Language Models
Satvik Golechha, Lucius Bushnaq, Euan Ong et al.
Shape as Line Segments: Accurate and Flexible Implicit Surface Representation
Siyu Ren, Junhui Hou
NExUME: Adaptive Training and Inference for DNNs under Intermittent Power Environments
Cyan Subhra Mishra, Deeksha Chaudhary, Jack Sampson et al.
Flow-MIL: Constructing Highly-expressive Latent Feature Space For Whole Slide Image Classification Using Normalizing Flow
Yingfan MA, Bohan An, Ao Shen et al.
SET: Spectral Enhancement for Tiny Object Detection
Huixin Sun, Runqi Wang, Yanjing Li et al.
The Source Image is the Best Attention for Infrared and Visible Image Fusion
Song Wang, Xie Han, Liqun Kuang et al.
Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation
Hyejin Oh, Woo-Shik Kim, Sangyoon Lee et al.
Event-based Visual Vibrometry
Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti et al.
S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
Guangting Zheng, Jiajun Deng, Xiaomeng Chu et al.
Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID
Zechao Hu, Zhengwei Yang, Hao Li et al.
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
Keyizhi Xu, Chi Zhang, Zhan Chen et al.
Towards Human-like Virtual Beings: Simulating Human Behavior in 3D Scenes
CHEN LIANG, Wenguan Wang, Yi Yang
EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-based Constraint for Open-source Dataset Copyright Protection
Ming Sun, Rui Wang, Zixuan Zhu et al.
Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research
Michał Bortkiewicz, Władysław Pałucki, Vivek Myers et al.
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Ziyue Wang, Yurui Dong, Fuwen Luo et al.
Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
Wenhan Wu, Zhishuai Guo, Chen Chen et al.
Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation
Zheng Gao, Jifei Song, Zhensong Zhang et al.
StyleSRN: Scene Text Image Super-Resolution with Text Style Embedding
Shengrong Yuan, Runmin Wang, Ke Hao et al.
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond
Dabing Yu, Zheng Gao
Incremental Few-Shot Semantic Segmentation via Multi-Level Switchable Visual Prompts
Maoxian Wan, Kaige Li, Qichuan Geng et al.
Neuroverse3D: Developing In-Context Learning Universal Model for Neuroimaging in 3D
Jiesi Hu, Hanyang Peng, Yanwu Yang et al.
Splat-based 3D Scene Reconstruction with Extreme Motion-blur
Hyeonjoong Jang, Dongyoung Choi, Donggun Kim et al.