"cross-modal interaction" Papers

16 papers found

Filters:cross-modal interaction Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling

Qizhi Pei, Rui Yan, Kaiyuan Gao et al.

ICLR 2025arXiv:2406.05797

citations

CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning

Ronghao Lin, Qiaolin He, Sijie Mai et al.

NEURIPS 2025arXiv:2602.04920

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Jiaqi Huang, Zunnan Xu, Ting Liu et al.

AAAI 2025paperarXiv:2501.08580

citations

Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples

Yeyuan Wang, Dehong Gao, Lei Yi et al.

AAAI 2025paperarXiv:2412.10029

citations

MokA: Multimodal Low-Rank Adaptation for MLLMs

Yake Wei, Yu Miao, Dongzhan Zhou et al.

NEURIPS 2025oralarXiv:2506.05191

citations

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing

Langyu Wang, Langyu Wang, Yingying Chen et al.

ICCV 2025arXiv:2507.01384

citations

Multimodal 3D Genome Pre-training

Minghao Yang, Pengteng Li, Yan Liang et al.

NEURIPS 2025arXiv:2504.09060

RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection

Xinquan Yu, Ziqi Sheng, Wei Lu et al.

AAAI 2025paperarXiv:2412.18254

citations

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.

ICCV 2025arXiv:2506.07986

citations

Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes

Feng Huang, Shuyuan Zheng, Zhaobing Qiu et al.

ICCV 2025arXiv:2503.07249

citations

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

Sensen Gao, Xiaojun Jia, Xuhong Ren et al.

ECCV 2024arXiv:2403.12445

citations

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.

AAAI 2024paperarXiv:2401.10588

citations

KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval

Xianwei Zhuang, Hongxiang Li, Xuxin Cheng et al.

ECCV 2024

citations

Libra: Building Decoupled Vision System on Large Language Models

Yifan Xu, Xiaoshan Yang, Yaguang Song et al.

ICML 2024arXiv:2405.10140

citations

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

ICML 2024arXiv:2405.17730

citations

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244

citations