"vision-language integration" Papers
8 papers found
Conference
Crafting Dynamic Virtual Activities with Advanced Multimodal Models
Changyang Li, Qingan Yan, Minyoung Kim et al.
ISMAR 2025paperarXiv:2406.17582
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
ICCV 2025highlightarXiv:2502.06788
19
citations
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
ICLR 2025arXiv:2408.04840
243
citations
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information
Yi Chen, Jian Xu, Xu-Yao Zhang et al.
AAAI 2025paperarXiv:2409.01179
15
citations
Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering
Haoyu Zhang, Meng Liu, Zixin Liu et al.
ICML 2024oral
Revealing Vision-Language Integration in the Brain with Multimodal Networks
Vighnesh Subramaniam, Colin Conwell, Christopher Wang et al.
ICML 2024arXiv:2406.14481
18
citations
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.
CVPR 2024arXiv:2312.12470
92
citations
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen, Zhaoyang Lv, Shiwei Wu et al.
CVPR 2024arXiv:2406.11816
116
citations