by Suyuchen Wang Papers
5 papers found
Conference
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.
NEURIPS 2025arXiv:2502.01341
1
citations
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.
ICLR 2025arXiv:2412.04626
5
citations
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li, Tianyu Zhang, Zhiqi Bu et al.
ICLR 2025arXiv:2406.07529
10
citations
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang, Suyuchen Wang, Yun Zhu et al.
NEURIPS 2025arXiv:2505.18962
8
citations
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang, Suyuchen Wang, Lu Li et al.
ICLR 2025arXiv:2406.06462
11
citations