"multimodal agents" Papers
6 papers found
Conference
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
CHEN CHEN, Yuchen Hu, Siyin Wang et al.
ICLR 2025arXiv:2501.17202
22
citations
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao, Zitao Liu et al.
ICCV 2025arXiv:2406.08451
113
citations
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
NEURIPS 2025arXiv:2503.10809
10
citations
Perception in Reflection
Yana Wei, Liang Zhao, Kangheng Lin et al.
ICML 2025arXiv:2504.07165
8
citations
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang, Yinheng Li, Dan Zhao et al.
ICLR 2025arXiv:2410.19100
26
citations
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen, Dileepa Pitawela, Chongyang Zhao et al.
AAAI 2024paperarXiv:2312.15820
19
citations