"cross-modal retrieval" Papers

34 papers found

Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search

Shuyu Yang, Yaxiong Wang, Li Zhu et al.

ICCV 2025highlightarXiv:2411.17776
10
citations

CellCLIP - Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning

MingYu Lu, Ethan Weinberger, Chanwoo Kim et al.

NEURIPS 2025arXiv:2506.06290
2
citations

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

Han Wang, Shengyang Li, Jian Yang et al.

ICCV 2025arXiv:2506.22027
6
citations

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

Po-han Li, Sandeep Chinchali, ufuk topcu

ICLR 2025arXiv:2410.07610
5
citations

Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval

Shuang Zhang, Yue Wu, Lei Shi et al.

NEURIPS 2025

Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces

Souhail Hadgi, Luca Moschella, Andrea Santilli et al.

CVPR 2025arXiv:2503.05283
2
citations

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Pengfei Zhao, Rongbo Luan, Wei Zhang et al.

NEURIPS 2025arXiv:2506.06970
1
citations

Learning Shared Representations from Unpaired Data

Amitai Yacobi, Nir Ben-Ari, Ronen Talmon et al.

NEURIPS 2025arXiv:2505.21524

Lightweight Contrastive Distilled Hashing for Online Cross-modal Retrieval

Jiaxing Li, Lin Jiang, Zeqi Ma et al.

AAAI 2025paperarXiv:2502.19751
2
citations

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition - And Ways to Overcome Them

Harish Haresamudram, Apoorva Beedu, Mashfiqui Rabbi et al.

AAAI 2025paperarXiv:2408.12023
9
citations

MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS

Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.

ICLR 2025arXiv:2411.02571
86
citations

MotionBind: Multi-Modal Human Motion Alignment for Retrieval, Recognition, and Generation

Kaleab Kinfu, Rene Vidal

NEURIPS 2025oral

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Christopher Liao, Christian So, Theodoros Tsiligkaridis et al.

ICLR 2025arXiv:2402.04416
1
citations

NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval

Zengrong Lin, Zheng Wang, Tianwen Qian et al.

CVPR 2025arXiv:2503.10526
2
citations

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Davide Caffagni, Sara Sarto, Marcella Cornia et al.

CVPR 2025arXiv:2503.01980
6
citations

Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels

Ruitao Pu, Yuan Sun, Yang Qin et al.

AAAI 2025paperarXiv:2501.01699
14
citations

SEGA: Shaping Semantic Geometry for Robust Hashing under Noisy Supervision

Yiyang Gu, Bohan Wu, Qinghua Ran et al.

NEURIPS 2025

SensorLM: Learning the Language of Wearable Sensors

Yuwei Zhang, Kumar Ayush, Siyuan Qiao et al.

NEURIPS 2025arXiv:2506.09108
19
citations

SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments

Simon Dahan, Gabriel Bénédict, Logan Williams et al.

ICLR 2025arXiv:2501.16471
3
citations

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Haobin Li, Peng Hu, Qianjun Zhang et al.

ICLR 2025arXiv:2410.15624
9
citations

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Young Kyun Jang, Ser-Nam Lim

ICCV 2025arXiv:2405.14715
2
citations

TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.

ICLR 2025arXiv:2410.10034
17
citations

Unbiased Prototype Consistency Learning for Multi-Modal and Multi-Task Object Re-Identification

Zhongao Zhou, Bin Yang, Wenke Huang et al.

NEURIPS 2025spotlight

Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding

Guangyin Bao, Qi Zhang, Zixuan Gong et al.

AAAI 2025paperarXiv:2404.13282
1
citations

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
98
citations

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732
86
citations

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

Bang Yang, Yong Dai, Xuxin Cheng et al.

AAAI 2024paperarXiv:2401.17186
9
citations

Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective

Fangzhou Song, Bin Zhu, Yanbin Hao et al.

ECCV 2024arXiv:2312.04763
10
citations

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

ECCV 2024arXiv:2403.10153
23
citations

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

Zhuohang Dang, Minnan Luo, Chengyou Jia et al.

AAAI 2024paperarXiv:2312.16478
12
citations

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

Samuele Poppi, Tobia Poppi, Federico Cocchi et al.

ECCV 2024arXiv:2311.16254
10
citations

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024highlightarXiv:2403.17998
69
citations

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

Kangning Yin, Shihao Zou, Yuxuan Ge et al.

CVPR 2024highlightarXiv:2403.00691
15
citations

Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models

Yifei Ming, Sharon Li

ICML 2024arXiv:2405.01468
10
citations