Poster "multimodal document understanding" Papers
4 papers found
Conference
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.
NEURIPS 2025arXiv:2502.01341
1
citations
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.
ICLR 2025arXiv:2412.04626
5
citations
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
zhentao he, Can Zhang, Ziheng Wu et al.
NEURIPS 2025arXiv:2506.20168
2
citations
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka, Taichi Iki, Taku Hasegawa et al.
CVPR 2025arXiv:2504.09795
27
citations