"cross-modal learning" Papers

20 papers found

CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection

Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.

AAAI 2025paperarXiv:2501.00346
16
citations

Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning

Dongkwan Lee, JunHoo Lee, Nojun Kwak

NEURIPS 2025arXiv:2510.13865

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

Hao Zheng, Jingjun Yi, Qi Bi et al.

NEURIPS 2025

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

ICLR 2025arXiv:2405.18213
9
citations

RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion

Geonho Bang, Minjae Seong, Jisong Kim et al.

ICCV 2025arXiv:2509.17712

Rotary Masked Autoencoders are Versatile Learners

Uros Zivanovic, Serafina Di Gioia, Andre Scaffidi et al.

NEURIPS 2025arXiv:2505.20535
1
citations

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Huy Ta, Duy Anh Huynh, Yutong Xie et al.

ICCV 2025highlightarXiv:2505.15123
2
citations

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Sicong Leng, Yun Xing, Zesen Cheng et al.

NEURIPS 2025arXiv:2410.12787
30
citations

Towards Out-of-Modal Generalization without Instance-level Modal Correspondence

Zhuo Huang, Gang Niu, Bo Han et al.

ICLR 2025
3
citations

Vector-ICL: In-context Learning with Continuous Vector Representations

Yufan Zhuang, Chandan Singh, Liyuan Liu et al.

ICLR 2025arXiv:2410.05629
10
citations

WildSAT: Learning Satellite Image Representations from Wildlife Observations

Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.

ICCV 2025arXiv:2412.14428
10
citations

Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li et al.

CVPR 2024highlightarXiv:2309.01327
113
citations

CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing

Faegheh Sardari, Armin Mustafa, Philip JB Jackson et al.

ECCV 2024arXiv:2405.10690
11
citations

Cycle-Consistency Learning for Captioning and Grounding

Ning Wang, Jiajun Deng, Mingbo Jia

AAAI 2024paperarXiv:2312.15162
14
citations

DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition

Sijie Wang, Rui She, Qiyu Kang et al.

AAAI 2024paperarXiv:2312.10616
11
citations

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts

Peipei Liu, Hong Li, Yimo Ren et al.

AAAI 2024paperarXiv:2305.08372
8
citations

LEROjD: Lidar Extended Radar-Only Object Detection

Patrick Palmer, Martin Krüger, Stefan Schütte et al.

ECCV 2024arXiv:2409.05564
2
citations

Position: Mission Critical – Satellite Data is a Distinct Modality in Machine Learning

Esther Rolf, Konstantin Klemmer, Caleb Robinson et al.

ICML 2024spotlight

Reinforcement Learning Friendly Vision-Language Model for Minecraft

Haobin Jiang, Junpeng Yue, Hao Luo et al.

ECCV 2024arXiv:2303.10571
15
citations

TrajPrompt: Aligning Color Trajectory with Vision-Language Representations

Li-Wu Tsao, Hao-Tang Tsui, Yu-Rou Tuan et al.

ECCV 2024