"visual document understanding" Papers
4 papers found
Conference
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni, Zhengyuan Yang, Linjie Li et al.
NEURIPS 2025arXiv:2505.19702
13
citations
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
Hao Guo, Xugong Qin, Jun Jie Ou Yang et al.
CVPR 2025arXiv:2512.20174
1
citations
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju, Peng Tang, Qi Dong et al.
AAAI 2024paperarXiv:2306.01733
58
citations
InstructDoc: A Dataset for Zero
Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.
AAAI 2024paperarXiv:2401.13313
36
citations