"vision-language datasets" Papers
4 papers found
Conference
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
NEURIPS 2025spotlightarXiv:2505.21375
12
citations
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang et al.
CVPR 2025arXiv:2310.14356
5
citations
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.
ECCV 2024arXiv:2404.19753
100
citations
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
ECCV 2024arXiv:2404.01197
26
citations