Paper "visual question answering" Papers

16 papers found

Consistency of Compositional Generalization Across Multiple Levels

Chuanhao Li, Zhen Li, Chenchen Jing et al.

AAAI 2025paperarXiv:2412.13636
1
citations

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Yuxuan Wang, Yijun Liu, Fei Yu et al.

AAAI 2025paperarXiv:2407.01081
7
citations

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Qirui Chen, Shangzhe Di, Weidi Xie

AAAI 2025paperarXiv:2408.14469
27
citations

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Guosheng Zhang, Keyao Wang, Haixiao Yue et al.

AAAI 2025paperarXiv:2501.01720
6
citations

Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA

Jian Lan, Diego Frassinelli, Barbara Plank

AAAI 2025paperarXiv:2410.02773
3
citations

Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models

Quang-Hung Le, Long Hoang Dang, Ngan Hoang Le et al.

AAAI 2025paperarXiv:2412.08125
3
citations

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Chao Pang, Xingxing Weng, Jiang Wu et al.

AAAI 2025paperarXiv:2403.20213
54
citations

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
192
citations

BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

Minjun Kim, SeungWoo Song, Youhan Lee et al.

AAAI 2024paperarXiv:2401.06443
10
citations

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
264
citations

Detection-Based Intermediate Supervision for Visual Question Answering

Yuhang Liu, Daowan Peng, Wei Wei et al.

AAAI 2024paperarXiv:2312.16012
3
citations

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

Junyi Chen, Longteng Guo, Jia Sun et al.

AAAI 2024paperarXiv:2308.11971
20
citations

Image Content Generation with Causal Reasoning

Xiaochuan Li, Baoyu Fan, Run Zhang et al.

AAAI 2024paperarXiv:2312.07132
12
citations

Interactive Visual Task Learning for Robots

Weiwei Gu, Anant Sah, N. Gopalan

AAAI 2024paperarXiv:2312.13219
7
citations

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.

AAAI 2024paperarXiv:2305.14836
271
citations

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Chengen Lai, Shengli Song, Shiqi Meng et al.

AAAI 2024paperarXiv:2312.13594
10
citations