"multimodal tasks" Papers

12 papers found

DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models

Kaishen Wang, Hengrui Gu, Meijun Gao et al.

ICLR 2025
7
citations

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

Letitia Parcalabescu, Anette Frank

ICLR 2025arXiv:2404.18624
20
citations

Flexible Frame Selection for Efficient Video Reasoning

Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.

CVPR 2025
10
citations

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025arXiv:2503.15024
11
citations

HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models

Haoran Li, Yingjie Qin, Baoyuan Ou et al.

NEURIPS 2025oralarXiv:2505.20444
2
citations

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

Seongyun Lee, Geewook Kim, Jiyeon Kim et al.

ICLR 2025arXiv:2410.07571
4
citations

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025arXiv:2504.08851
9
citations

Refining CLIP's Spatial Awareness: A Visual-Centric Perspective

Congpei Qiu, Yanhao Wu, Wei Ke et al.

ICLR 2025arXiv:2504.02328
7
citations

REOBench: Benchmarking Robustness of Earth Observation Foundation Models

Xiang Li, Yong Tao, Siyuan Zhang et al.

NEURIPS 2025arXiv:2505.16793
3
citations

Teaching Human Behavior Improves Content Understanding Abilities Of VLMs

SOMESH SINGH, Harini S I, Yaman Singla et al.

ICLR 2025
2
citations

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.

AAAI 2024paperarXiv:2401.12863
82
citations

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

Hao Cheng, Erjia Xiao, Jindong Gu et al.

ECCV 2024arXiv:2402.19150
15
citations