"text-to-image generation" Papers
222 papers found • Page 3 of 5
Conference
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.
RB-Modulation: Training-Free Stylization using Reference-Based Modulation
Litu Rout, Yujia Chen, Nataniel Ruiz et al.
Rectified CFG++ for Flow Based Models
Shreshth Saini, Shashank Gupta, Alan Bovik
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
Rui Tian, Qi Dai, Jianmin Bao et al.
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.
ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, yixuan liu, Takashi Isobe et al.
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Jiaxiang Cheng, Pan Xie, Xin Xia et al.
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park et al.
Role Bias in Diffusion Models: Diagnosing and Mitigating through Intermediate Decomposition
Sina Malakouti, Adriana Kovashka
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen, Shuchen Xue, Yuyang Zhao et al.
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
Scaling can lead to compositional generalization
Florian Redhardt, Yassir Akram, Simon Schug
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
ScImage: How good are multimodal large language models at scientific text-to-image generation?
Leixin Zhang, Steffen Eger, Yinjie Cheng et al.
SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning
Zhewei Dai, Shilei Zeng, Haotian Liu et al.
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Ce Zhang, Zifu Wan, Zhehan Kan et al.
Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models
Lexiang Xiong, Liu Chengyu, Jingwen Ye et al.
Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch
Xu Cai, Yang Wu, Qianli Chen et al.
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota, Zongze Wu, Richard Zhang et al.
SparseDiT: Token Sparsification for Efficient Diffusion Transformer
Shuning Chang, Pichao WANG, Jiasheng Tang et al.
Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen, Jingxi Yu, Zichen Miao et al.
Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling
Jaskirat Singh, Junshen K Chen, Jonas Kohler et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Youngjung Uh et al.
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Xilin He, Cheng Luo, Xiaole Xian et al.
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang, Cheng Chen, xulei yang et al.
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
Yuanhao Ban, Ruochen Wang, Tianyi Zhou et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.
UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset
Chen Zhao, En Ci, Yunzhe Xu et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu, Yusu Qian, Chen Chen et al.
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient
Yongliang Wu, Shiji Zhou, Mingzhuo Yang et al.
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
wang lin, Wentao Hu, Liyu Jia et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models
Donghoon Ahn, Jiwon Kang, Sanghyun Lee et al.
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
jian ma, Qirong Peng, Xu Guo et al.
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng, Xiang Zhang, Zirui Wang et al.
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
Lichen Bai, Shitong Shao, zikai zhou et al.
Accelerating Parallel Sampling of Diffusion Models
Zhiwei Tang, Jiasheng Tang, Hao Luo et al.
AFreeCA: Annotation-Free Counting for All
Adriano DAlessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh