"multimodal language model" Papers
3 papers found
Conference
Visual Representations inside the Language Model
Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin et al.
COLM 2025paper
2
citations
Learning Video Context as Interleaved Multimodal Sequences
Qinghong Lin, Pengchuan Zhang, Difei Gao et al.
ECCV 2024arXiv:2407.21757
12
citations
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk, Lijun Yu, Xiuye Gu et al.
ICML 2024arXiv:2312.14125
420
citations