"cross-modal interaction" Papers

16 papers found

3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling

Qizhi Pei, Rui Yan, Kaiyuan Gao et al.

ICLR 2025arXiv:2406.05797
6
citations

CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning

Ronghao Lin, Qiaolin He, Sijie Mai et al.

NEURIPS 2025arXiv:2602.04920

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

Jiaqi Huang, Zunnan Xu, Ting Liu et al.

AAAI 2025paperarXiv:2501.08580
21
citations

Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples

Yeyuan Wang, Dehong Gao, Lei Yi et al.

AAAI 2025paperarXiv:2412.10029
4
citations

MokA: Multimodal Low-Rank Adaptation for MLLMs

Yake Wei, Yu Miao, Dongzhan Zhou et al.

NEURIPS 2025oralarXiv:2506.05191
1
citations

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing

Langyu Wang, Langyu Wang, Yingying Chen et al.

ICCV 2025arXiv:2507.01384
1
citations

Multimodal 3D Genome Pre-training

Minghao Yang, Pengteng Li, Yan Liang et al.

NEURIPS 2025arXiv:2504.09060

RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection

Xinquan Yu, Ziqi Sheng, Wei Lu et al.

AAAI 2025paperarXiv:2412.18254
2
citations

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.

ICCV 2025arXiv:2506.07986
6
citations

Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes

Feng Huang, Shuyuan Zheng, Zhaobing Qiu et al.

ICCV 2025arXiv:2503.07249
1
citations

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

Sensen Gao, Xiaojun Jia, Xuhong Ren et al.

ECCV 2024arXiv:2403.12445
34
citations

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.

AAAI 2024paperarXiv:2401.10588
45
citations

KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval

Xianwei Zhuang, Hongxiang Li, Xuxin Cheng et al.

ECCV 2024
10
citations

Libra: Building Decoupled Vision System on Large Language Models

Yifan Xu, Xiaoshan Yang, Yaguang Song et al.

ICML 2024arXiv:2405.10140
10
citations

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

ICML 2024arXiv:2405.17730
64
citations

Temporal Adaptive RGBT Tracking with Modality Prompt

Hongyu Wang, Xiaotao Liu, Yifan Li et al.

AAAI 2024paperarXiv:2401.01244
75
citations