Poster "vision-language integration" Papers
4 papers found
Conference
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye, Haiyang Xu, Haowei Liu et al.
ICLR 2025arXiv:2408.04840
243
citations
Revealing Vision-Language Integration in the Brain with Multimodal Networks
Vighnesh Subramaniam, Colin Conwell, Christopher Wang et al.
ICML 2024arXiv:2406.14481
18
citations
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.
CVPR 2024arXiv:2312.12470
92
citations
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen, Zhaoyang Lv, Shiwei Wu et al.
CVPR 2024arXiv:2406.11816
116
citations