"diffusion transformer" Papers
42 papers found
Conference
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
Zhen Han, Zeyinzi Jiang, Yulin Pan et al.
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng, Xinyi Wu, Yongxing Yang et al.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia et al.
Diff-Prompt: Diffusion-driven Prompt Generator with Mask Supervision
Weicai Yan, Wang Lin, Zirun Guo et al.
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data
Hengyu Fu, Zehao Dou, Jiawei Guo et al.
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Shuang Wu, Youtian Lin, Feihu Zhang et al.
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov, Di Chang, Minh Tran et al.
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.
DreamFuse: Adaptive Image Fusion with Diffusion Transformer
Junjia Huang, Pengxiang Yan, Jiyang Liu et al.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
Yuxuan Zhang, Yirui Yuan, Yiren Song et al.
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.
FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation
Zeyu Zhang, Yiran Wang, Danning Li et al.
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo, Liangbing Zhao, Sayak Paul et al.
IRASim: A Fine-Grained World Model for Robot Manipulation
Fangqi Zhu, Hongtao Wu, Song Guo et al.
Language-Guided Image Tokenization for Generation
Kaiwen Zha, Lijun Yu, Alireza Fathi et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Yuxuan Bian, Ailing Zeng, Xuan Ju et al.
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen et al.
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
Neural-Driven Image Editing
Pengfei Zhou, Jie Xia, Xiaopeng Peng et al.
OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang et al.
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant, Ethan Weber, Jin Kyu Kim et al.
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin, Zhicheng Sun, Ningyuan Li et al.
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang, Long Mai, Aniruddha Mahapatra et al.
ROSE: Remove Objects with Side Effects in Videos
Chenxuan Miao, Yutong Feng, Jianshu Zeng et al.
Stable Flow: Vital Layers for Training-Free Image Editing
Omri Avrahami, Or Patashnik, Ohad Fried et al.
TokMan:Tokenize Manhattan Mask Optimization for Inverse Lithography
Yiwen Wu, Yuyang Chen, Ye Xia et al.
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Zhenghao Zhang, Junchao Liao, Menghao Li et al.
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
Haoxuan Wang, Jinlong Peng, Qingdong He et al.
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
VIRES: Video Instance Repainting via Sketch and Text Guided Generation
Shuchen Weng, Haojie Zheng, Peixuan Zhang et al.
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
jian ma, Qirong Peng, Xu Guo et al.
CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling
JUNCHAO GONG, LEI BAI, Peng Ye et al.
Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
Zhuoyi Yang, Heyang Jiang, Wenyi Hong et al.
Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.
Lazy Diffusion Transformer for Interactive Image Editing
Yotam Nitzan, Zongze Wu, Richard Zhang et al.
PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen, Chongjian GE, Enze Xie et al.
SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
Kashyap Chitta, Daniel Dauner, Andreas Geiger