"multimodal language models" Papers
10 papers found
Conference
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
CVPR 2025arXiv:2408.00754
9
citations
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh et al.
ICLR 2025arXiv:2406.12814
81
citations
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh, Kristen Grauman
NEURIPS 2025arXiv:2511.13993
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
ICLR 2025arXiv:2408.15881
38
citations
Mitigating Modal Imbalance in Multimodal Reasoning
Chen Henry Wu, Neil Kale, Aditi Raghunathan
COLM 2025paperarXiv:2510.02608
1
citations
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng et al.
ICLR 2025arXiv:2406.08407
36
citations
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
Arijit Ray, Jiafei Duan, Ellis L Brown II et al.
COLM 2025paperarXiv:2412.07755
49
citations
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.
CVPR 2025arXiv:2312.11556
34
citations
Synthetic Visual Genome
Jae Sung Park, Zixian Ma, Linjie Li et al.
CVPR 2025arXiv:2506.07643
2
citations
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar, Zhenshi Li, Feng Gu et al.
ECCV 2024arXiv:2402.02544
133
citations