"text-to-image generation" Papers

222 papers found • Page 3 of 5

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.

NEURIPS 2025arXiv:2510.18353

RB-Modulation: Training-Free Stylization using Reference-Based Modulation

Litu Rout, Yujia Chen, Nataniel Ruiz et al.

ICLR 2025

Rectified CFG++ for Flow Based Models

Shreshth Saini, Shashank Gupta, Alan Bovik

NEURIPS 2025

REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents

Rui Tian, Qi Dai, Jianmin Bao et al.

ICCV 2025arXiv:2411.13552
7
citations

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

ICCV 2025arXiv:2503.12271
22
citations

ReNeg: Learning Negative Embedding with Reward Guidance

Xiaomin Li, yixuan liu, Takashi Isobe et al.

CVPR 2025highlightarXiv:2412.19637
6
citations

REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.

NEURIPS 2025
8
citations

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Jiaxiang Cheng, Pan Xie, Xin Xia et al.

AAAI 2025paperarXiv:2403.02084
23
citations

RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.

NEURIPS 2025arXiv:2509.15257

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.

ICCV 2025arXiv:2506.07986
6
citations

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

Eunji Kim, Siwon Kim, Minjun Park et al.

CVPR 2025arXiv:2408.12692
13
citations

Role Bias in Diffusion Models: Diagnosing and Mitigating through Intermediate Decomposition

Sina Malakouti, Adriana Kovashka

NEURIPS 2025arXiv:2503.10037

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Junsong Chen, Shuchen Xue, Yuyang Zhao et al.

ICCV 2025highlightarXiv:2503.09641
41
citations

Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.

ICCV 2025arXiv:2410.18013
23
citations

Scaling can lead to compositional generalization

Florian Redhardt, Yassir Akram, Simon Schug

NEURIPS 2025spotlightarXiv:2507.07207
1
citations

Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation

Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.

CVPR 2025arXiv:2412.01243
19
citations

ScImage: How good are multimodal large language models at scientific text-to-image generation?

Leixin Zhang, Steffen Eger, Yinjie Cheng et al.

ICLR 2025arXiv:2412.02368
5
citations

SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning

Zhewei Dai, Shilei Zeng, Haotian Liu et al.

ICCV 2025arXiv:2410.14987
12
citations

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025arXiv:2502.06130
21
citations

Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

Lexiang Xiong, Liu Chengyu, Jingwen Ye et al.

NEURIPS 2025arXiv:2510.22851
1
citations

Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch

Xu Cai, Yang Wu, Qianli Chen et al.

NEURIPS 2025arXiv:2510.17858

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Jinheng Xie, Weijia Mao, Zechen Bai et al.

ICLR 2025arXiv:2408.12528
484
citations

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu, Haochuan Li, Wenjie Wang et al.

CVPR 2025arXiv:2412.05818
10
citations

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

Rohit Gandikota, Zongze Wu, Richard Zhang et al.

ICCV 2025arXiv:2502.01639
10
citations

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Shuning Chang, Pichao WANG, Jiasheng Tang et al.

NEURIPS 2025oralarXiv:2412.06028
3
citations

Sparse Fine-Tuning of Transformers for Generative Tasks

Wei Chen, Jingxi Yu, Zichen Miao et al.

ICCV 2025arXiv:2507.10855
3
citations

Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling

Jaskirat Singh, Junshen K Chen, Jonas Kohler et al.

ICLR 2025arXiv:2504.05800
3
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811
26
citations

StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance

Jaeseok Jeong, Junho Kim, Youngjung Uh et al.

ICCV 2025arXiv:2510.06827
2
citations

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

Xilin He, Cheng Luo, Xiaole Xian et al.

ICCV 2025arXiv:2410.09865
7
citations

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.00703
100
citations

Text-to-Image Rectified Flow as Plug-and-Play Priors

Xiaofeng Yang, Cheng Chen, xulei yang et al.

ICLR 2025arXiv:2406.03293
23
citations

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Yuanhao Ban, Ruochen Wang, Tianyi Zhou et al.

ICLR 2025arXiv:2406.01970
13
citations

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation

Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.

CVPR 2025arXiv:2412.03178
5
citations

Trade-offs in Image Generation: How Do Different Dimensions Interact?

Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.

ICCV 2025arXiv:2507.22100
2
citations

TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.

ICLR 2025arXiv:2410.10034
17
citations

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

Chen Zhao, En Ci, Yunzhe Xu et al.

NEURIPS 2025arXiv:2510.20661
9
citations

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.

CVPR 2025arXiv:2412.18928
7
citations

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

Tsu-Jui Fu, Yusu Qian, Chen Chen et al.

ICCV 2025arXiv:2503.12652
12
citations

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.

AAAI 2025paperarXiv:2405.15304
51
citations

VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness

SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.

CVPR 2025arXiv:2503.16406
4
citations

Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

wang lin, Wentao Hu, Liyu Jia et al.

NEURIPS 2025

Visual Persona: Foundation Model for Full-Body Human Customization

Jisu Nam, Soowon Son, Zhan Xu et al.

CVPR 2025arXiv:2503.15406
6
citations

VODiff: Controlling Object Visibility Order in Text-to-Image Generation

Dong Liang, Jinyuan Jia, Yuhao Liu et al.

CVPR 2025
3
citations

Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models

Donghoon Ahn, Jiwon Kang, Sanghyun Lee et al.

NEURIPS 2025arXiv:2506.10978
1
citations

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

jian ma, Qirong Peng, Xu Guo et al.

ICCV 2025arXiv:2503.06134
5
citations

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng, Xiang Zhang, Zirui Wang et al.

ICCV 2025arXiv:2508.00728
6
citations

Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

Lichen Bai, Shitong Shao, zikai zhou et al.

ICLR 2025arXiv:2412.10891
29
citations

Accelerating Parallel Sampling of Diffusion Models

Zhiwei Tang, Jiasheng Tang, Hao Luo et al.

ICML 2024arXiv:2402.09970
25
citations

AFreeCA: Annotation-Free Counting for All

Adriano DAlessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh

ECCV 2024arXiv:2403.04943
7
citations