"image-text alignment" Papers

19 papers found

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

Po-han Li, Sandeep Chinchali, ufuk topcu

ICLR 2025arXiv:2410.07610
5
citations

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

Aniket Rege, Zinnia Nie, Unmesh Raskar et al.

ICCV 2025arXiv:2506.08071
4
citations

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences

Hyojin Bahng, Caroline Chan, Fredo Durand et al.

ICCV 2025arXiv:2506.02095
7
citations

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy

You Li, Fan Ma, Yi Yang

CVPR 2025arXiv:2411.16752
10
citations

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.

ICCV 2025arXiv:2509.26231
1
citations

Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability

Zhiyu Zhu, Zhibo Jin, Jiayu Zhang et al.

ICLR 2025arXiv:2502.14889
3
citations

Open Ad-hoc Categorization with Contextualized Feature Learning

Zilin Wang, Sangwoo Mo, Stella X. Yu et al.

CVPR 2025arXiv:2512.16202
1
citations

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Zhengfeng Lai, Vasileios Saveris, Chen Chen et al.

ICLR 2025arXiv:2410.02740
9
citations

See Further When Clear: Curriculum Consistency Model

Yunpeng Liu, Boxiao Liu, Yi Zhang et al.

CVPR 2025arXiv:2412.06295
3
citations

UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint

Enis Simsar, Alessio Tonioni, Yongqin Xian et al.

ICCV 2025arXiv:2412.15216
1
citations

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

YUXI REN, Jie Wu, Yanzuo Lu et al.

ECCV 2024arXiv:2404.04860
10
citations

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024arXiv:2404.01291
357
citations

Expediting Contrastive Language-Image Pretraining via Self-Distilled Encoders

Bumsoo Kim, Jinhyung Kim, Yeonsik Jo et al.

AAAI 2024paperarXiv:2312.12659
5
citations

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts

Peipei Liu, Hong Li, Yimo Ren et al.

AAAI 2024paperarXiv:2305.08372
8
citations

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2403.16005
49
citations

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.

ECCV 2024arXiv:2312.03766
17
citations

Referring Expression Counting

Siyang Dai, Jun Liu, Ngai-Man Cheung

CVPR 2024highlightarXiv:2505.22850
3
citations

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Mu Cai, Haotian Liu, Yuheng Li et al.

ECCV 2024arXiv:2410.00905
7
citations

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Trung Dao, Thuan Nguyen, Thanh Van Le et al.

ECCV 2024arXiv:2408.14176
35
citations