Paper "visual question answering" Papers
16 papers found
Conference
Consistency of Compositional Generalization Across Multiple Levels
Chuanhao Li, Zhen Li, Chenchen Jing et al.
AAAI 2025paperarXiv:2412.13636
1
citations
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang, Yijun Liu, Fei Yu et al.
AAAI 2025paperarXiv:2407.01081
7
citations
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen, Shangzhe Di, Weidi Xie
AAAI 2025paperarXiv:2408.14469
27
citations
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Guosheng Zhang, Keyao Wang, Haixiao Yue et al.
AAAI 2025paperarXiv:2501.01720
6
citations
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA
Jian Lan, Diego Frassinelli, Barbara Plank
AAAI 2025paperarXiv:2410.02773
3
citations
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
Quang-Hung Le, Long Hoang Dang, Ngan Hoang Le et al.
AAAI 2025paperarXiv:2412.08125
3
citations
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Chao Pang, Xingxing Weng, Jiang Wu et al.
AAAI 2025paperarXiv:2403.20213
54
citations
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
AAAI 2024paperarXiv:2308.09936
192
citations
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
Minjun Kim, SeungWoo Song, Youhan Lee et al.
AAAI 2024paperarXiv:2401.06443
10
citations
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AAAI 2024paperarXiv:2308.06394
264
citations
Detection-Based Intermediate Supervision for Visual Question Answering
Yuhang Liu, Daowan Peng, Wei Wei et al.
AAAI 2024paperarXiv:2312.16012
3
citations
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen, Longteng Guo, Jia Sun et al.
AAAI 2024paperarXiv:2308.11971
20
citations
Image Content Generation with Causal Reasoning
Xiaochuan Li, Baoyu Fan, Run Zhang et al.
AAAI 2024paperarXiv:2312.07132
12
citations
Interactive Visual Task Learning for Robots
Weiwei Gu, Anant Sah, N. Gopalan
AAAI 2024paperarXiv:2312.13219
7
citations
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving
Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.
AAAI 2024paperarXiv:2305.14836
271
citations
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai, Shengli Song, Shiqi Meng et al.
AAAI 2024paperarXiv:2312.13594
10
citations