"multimodal interaction" Papers
7 papers found
Conference
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin, Xinyu Wei, Ruichuan An et al.
ICLR 2025arXiv:2403.20271
87
citations
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
ICCV 2025arXiv:2507.05056
3
citations
Lightweight Neural App Control
Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.
ICLR 2025arXiv:2410.17883
11
citations
LitForager: Exploring Multimodal Literature Foraging Strategies in Immersive Sensemaking
Haoyang Yang, Elliott H Faa, Weijian Liu et al.
ISMAR 2025paperarXiv:2508.15043
1
citations
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li, En Yu, Sijia Chen et al.
ICLR 2025arXiv:2503.10616
8
citations
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Wei Zhao, Pengxiang Ding, Zhang Min et al.
ICLR 2025arXiv:2502.13508
43
citations
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Jin Gao, Lei Gan, Yuankai Li et al.
ECCV 2024arXiv:2408.01091
4
citations