"multimodal instruction tuning" Papers
5 papers found
Conference
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
ICCV 2025highlightarXiv:2507.07424
7
citations
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu, Tianyue Ou, Yifan Song et al.
ICLR 2025arXiv:2410.13824
22
citations
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Kai Liu, Jungang Li, Yuchong Sun et al.
NEURIPS 2025oralarXiv:2512.22905
7
citations
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang, Shoutao Guo, Yan Zhou et al.
ICLR 2025arXiv:2409.06666
135
citations
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu, James Liang, Ruixiang Tang et al.
ICLR 2025arXiv:2503.00723
13
citations