"cross-modal learning" Papers
20 papers found
Conference
CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection
Xiaolei Wang, Xiaoyang Wang, Huihui Bai et al.
Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
Dongkwan Lee, JunHoo Lee, Nojun Kwak
Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization
Hao Zheng, Jingjun Yi, Qi Bi et al.
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto, Sascha Hornauer, Fabien Moutarde
RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
Geonho Bang, Minjae Seong, Jisong Kim et al.
Rotary Masked Autoencoders are Versatile Learners
Uros Zivanovic, Serafina Di Gioia, Andre Scaffidi et al.
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Huy Ta, Duy Anh Huynh, Yutong Xie et al.
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng, Yun Xing, Zesen Cheng et al.
Towards Out-of-Modal Generalization without Instance-level Modal Correspondence
Zhuo Huang, Gang Niu, Bo Han et al.
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang, Chandan Singh, Liyuan Liu et al.
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li et al.
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
Faegheh Sardari, Armin Mustafa, Philip JB Jackson et al.
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang, Jiajun Deng, Mingbo Jia
DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition
Sijie Wang, Rui She, Qiyu Kang et al.
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
Peipei Liu, Hong Li, Yimo Ren et al.
LEROjD: Lidar Extended Radar-Only Object Detection
Patrick Palmer, Martin Krüger, Stefan Schütte et al.
Position: Mission Critical – Satellite Data is a Distinct Modality in Machine Learning
Esther Rolf, Konstantin Klemmer, Caleb Robinson et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
Li-Wu Tsao, Hao-Tang Tsui, Yu-Rou Tuan et al.