"diffusion transformers" Papers

47 papers found

Accelerating Diffusion Transformers with Token-wise Feature Caching

Chang Zou, Xuyang Liu, Ting Liu et al.

ICLR 2025arXiv:2410.05317
69
citations

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Zongyi Li, Shujie HU, Shujie LIU et al.

ICLR 2025oralarXiv:2410.20502
28
citations

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers

Hui Zhang, Tingwei Gao, Jie Shao et al.

CVPR 2025arXiv:2503.15927
12
citations

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Songhua Liu, Zhenxiong Tan, Xinchao Wang

NEURIPS 2025arXiv:2412.16112
20
citations

Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Hee Min Choi, Hyoa Kang, Dokwan Oh et al.

NEURIPS 2025

Diffusion Transformers as Open-World Spatiotemporal Foundation Models

Yuan Yuan, Chonghua Han, Jingtao Ding et al.

NEURIPS 2025oralarXiv:2411.12164
10
citations

Diffusion Transformers for Tabular Data Time Series Generation

Fabrizio Garuti, Enver Sangineto, Simone Luetto et al.

ICLR 2025arXiv:2504.07566
3
citations

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025arXiv:2406.11427
28
citations

Dual Prompting Image Restoration with Diffusion Transformers

Dehong Kong, Fan Li, Zhixin Wang et al.

CVPR 2025arXiv:2504.17825
9
citations

EDiT: Efficient Diffusion Transformers with Linear Compressed Attention

Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.

ICCV 2025arXiv:2503.16726
5
citations

Emergent Temporal Correspondences from Video Diffusion Transformers

Jisu Nam, Soowon Son, Dahyun Chung et al.

NEURIPS 2025oralarXiv:2506.17220
11
citations

Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning

Yu Zhang, Jialei Zhou, Xinchen Li et al.

NEURIPS 2025arXiv:2505.19261
7
citations

Exploring Diffusion Transformer Designs via Grafting

Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.

NEURIPS 2025oralarXiv:2506.05340
5
citations

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim et al.

CVPR 2025highlightarXiv:2502.20126
5
citations

FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers

Yanbing Zhang, Zhe Wang, Qin Zhou et al.

ICCV 2025arXiv:2507.15249
1
citations

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.

ICCV 2025arXiv:2503.06923
37
citations

Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks

Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.

ICCV 2025arXiv:2503.17539
1
citations

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia et al.

ICCV 2025highlightarXiv:2503.16418
29
citations

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.

ICCV 2025arXiv:2505.00482
4
citations

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12444
38
citations

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Shen Zhang, Siyuan Liang, Yaning Tan et al.

NEURIPS 2025arXiv:2503.04344
1
citations

Localizing Knowledge in Diffusion Transformers

Arman Zarei, Samyadeep Basu, Keivan Rezaei et al.

NEURIPS 2025arXiv:2505.18832
2
citations

Long Context Tuning for Video Generation

Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.

ICCV 2025arXiv:2503.10589
60
citations

LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag

NEURIPS 2025spotlightarXiv:2505.23758
6
citations

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

Ziqiao Peng, Jiwen Liu, Haoxian Zhang et al.

NEURIPS 2025oralarXiv:2505.21448
15
citations

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan et al.

NEURIPS 2025arXiv:2506.05573
31
citations

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Weifeng Lin, Xinyu Wei, Renrui Zhang et al.

ICLR 2025arXiv:2409.15278
26
citations

Presto! Distilling Steps and Layers for Accelerating Music Generation

Zachary Novack, Ge Zhu, Jonah Casebeer et al.

ICLR 2025arXiv:2410.05167
16
citations

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Shufan Li, Konstantinos Kallidromitis, Akash Gokul et al.

ICCV 2025arXiv:2503.12271
22
citations

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

Yan Gong, Yiren Song, Yicheng Li et al.

NEURIPS 2025arXiv:2506.02528
15
citations

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou et al.

ICCV 2025arXiv:2504.10483
85
citations

REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.

NEURIPS 2025
8
citations

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang et al.

ICLR 2025arXiv:2410.06940
342
citations

Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape

Ruichen Chen, Keith Mills, Liyao Jiang et al.

NEURIPS 2025oralarXiv:2505.22918
1
citations

RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

Ahmet Berke Gökmen, Yiğit Ekin, Bahri Batuhan Bilecen et al.

NEURIPS 2025arXiv:2505.13344
3
citations

Sampling 3D Molecular Conformers with Diffusion Transformers

J. Thorben Frank, Winfried Ripken, Gregor Lied et al.

NEURIPS 2025arXiv:2506.15378
2
citations

Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers

Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.

NEURIPS 2025
6
citations

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Shuning Chang, Pichao WANG, Jiasheng Tang et al.

NEURIPS 2025oralarXiv:2412.06028
3
citations

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
40
citations

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.

ICCV 2025arXiv:2411.17616
3
citations

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

yifei xia, Suhan Ling, Fangcheng Fu et al.

ICCV 2025arXiv:2502.21079
35
citations

Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations

Chaofan Gan, Yuanpeng Tu, Xi Chen et al.

NEURIPS 2025arXiv:2505.18584
5
citations

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

Han Lin, Tushar Nagarajan, Nicolas Ballas et al.

ICLR 2025arXiv:2410.03478
7
citations

VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers

Qinkai XU, yijin liu, YangChen et al.

NEURIPS 2025oral

Video Motion Transfer with Diffusion Transformers

Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.

CVPR 2025arXiv:2412.07776
20
citations

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

Shentong Mo, Enze Xie, Yue Wu et al.

ECCV 2024arXiv:2312.07231
7
citations