"cross-modal alignment" Papers
55 papers found • Page 2 of 2
Conference
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
AAAI 2024paperarXiv:2312.12155
41
citations
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Kaibin Tian, Yanhua Cheng, Yi Liu et al.
AAAI 2024paperarXiv:2401.00701
16
citations
Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
Jinxia Yang, Bing Su, Xin Zhao et al.
ICML 2024oralarXiv:2405.19654
9
citations
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang et al.
CVPR 2024arXiv:2304.02970
34
citations
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.
ICML 2024arXiv:2406.02915
26
citations