"cross-modal interaction" Papers
16 papers found
Conference
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
Qizhi Pei, Rui Yan, Kaiyuan Gao et al.
ICLR 2025arXiv:2406.05797
6
citations
CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning
Ronghao Lin, Qiaolin He, Sijie Mai et al.
NEURIPS 2025arXiv:2602.04920
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang, Zunnan Xu, Ting Liu et al.
AAAI 2025paperarXiv:2501.08580
21
citations
Enhancing Fine-Grained Vision-Language Pretraining with Negative Augmented Samples
Yeyuan Wang, Dehong Gao, Lei Yi et al.
AAAI 2025paperarXiv:2412.10029
4
citations
MokA: Multimodal Low-Rank Adaptation for MLLMs
Yake Wei, Yu Miao, Dongzhan Zhou et al.
NEURIPS 2025oralarXiv:2506.05191
1
citations
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
Langyu Wang, Langyu Wang, Yingying Chen et al.
ICCV 2025arXiv:2507.01384
1
citations
Multimodal 3D Genome Pre-training
Minghao Yang, Pengteng Li, Yan Liang et al.
NEURIPS 2025arXiv:2504.09060
RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection
Xinquan Yu, Ziqi Sheng, Wei Lu et al.
AAAI 2025paperarXiv:2412.18254
2
citations
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.
ICCV 2025arXiv:2506.07986
6
citations
Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes
Feng Huang, Shuyuan Zheng, Zhaobing Qiu et al.
ICCV 2025arXiv:2503.07249
1
citations
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia, Xuhong Ren et al.
ECCV 2024arXiv:2403.12445
34
citations
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.
AAAI 2024paperarXiv:2401.10588
45
citations
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
Xianwei Zhuang, Hongxiang Li, Xuxin Cheng et al.
ECCV 2024
10
citations
Libra: Building Decoupled Vision System on Large Language Models
Yifan Xu, Xiaoshan Yang, Yaguang Song et al.
ICML 2024arXiv:2405.10140
10
citations
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
ICML 2024arXiv:2405.17730
64
citations
Temporal Adaptive RGBT Tracking with Modality Prompt
Hongyu Wang, Xiaotao Liu, Yifan Li et al.
AAAI 2024paperarXiv:2401.01244
75
citations