"image-text representation" Papers
2 papers found
Conference
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
CVPR 2025highlightarXiv:2412.04616
15
citations
Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.
CVPR 2024arXiv:2404.02145
16
citations