"text-to-video generation" Papers

42 papers found

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

Hui Zhang, Zuxuan Wu, Zhen Xing et al.

AAAI 2025paperarXiv:2311.14768
20
citations

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Weixi Feng, Chao Liu, Sifei Liu et al.

CVPR 2025arXiv:2501.07647
11
citations

Can Text-to-Video Generation help Video-Language Alignment?

Luca Zanella, Massimiliano Mancini, Willi Menapace et al.

CVPR 2025arXiv:2503.18507
1
citations

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng et al.

ICLR 2025oralarXiv:2408.06072
1409
citations

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.

NEURIPS 2025oralarXiv:2506.03517
14
citations

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

Donglin Di, He Feng, Wenzhang SUN et al.

ICCV 2025arXiv:2410.07151
5
citations

EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation

Diljeet Jagpal, Xi Chen, Vinay P. Namboodiri

CVPR 2025arXiv:2504.06861
2
citations

From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition

Ling Lo, Kelvin Chan, Wen-Huang Cheng et al.

ICCV 2025arXiv:2509.19690
1
citations

Goku: Flow Based Video Generative Foundation Models

Shoufa Chen, Chongjian GE, Yuqi Zhang et al.

CVPR 2025highlightarXiv:2502.04896
54
citations

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Kun Liu, Qi Liu, Xinchen Liu et al.

CVPR 2025arXiv:2503.23715
14
citations

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.

NEURIPS 2025arXiv:2501.19252
22
citations

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Tiehan Fan, Kepan Nan, Rui Xie et al.

CVPR 2025arXiv:2412.09283
14
citations

MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers

Yuechen Zhang, YaoYang Liu, Bin Xia et al.

ICCV 2025arXiv:2501.03931
22
citations

Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.

CVPR 2025
8
citations

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Shuai Tan, Biao Gong, Yutong Feng et al.

CVPR 2025arXiv:2412.03085
16
citations

Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM

Zirui Pan, Xin Wang, Yipeng Zhang et al.

AAAI 2025paperarXiv:2504.12048
5
citations

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation

Yanchen Liu, Yanan SUN, Zhening Xing et al.

ICCV 2025arXiv:2507.16310
2
citations

OmniGen-AR: AutoRegressive Any-to-Image Generation

Junke Wang, Xun Wang, Qiushan Guo et al.

NEURIPS 2025

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Kepan Nan, Rui Xie, Penghao Zhou et al.

ICLR 2025arXiv:2407.02371
207
citations

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

Hengjia Li, Haonan Qiu, Shiwei Zhang et al.

ICCV 2025arXiv:2411.17048
19
citations

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

Yitian Zhang, Long Mai, Aniruddha Mahapatra et al.

ICCV 2025arXiv:2503.08665
1
citations

STDD: Spatio-Temporal Dual Diffusion for Video Generation

Shuaizhen Yao, Xiaoya Zhang, Xin Liu et al.

CVPR 2025
2
citations

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Roberto Henschel, Levon Khachatryan, Hayk Poghosyan et al.

CVPR 2025arXiv:2403.14773
164
citations

T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks

Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.

NEURIPS 2025arXiv:2505.06679
6
citations

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

Aoxiong Yin, Kai Shen, Yichong Leng et al.

ICCV 2025arXiv:2503.04606

Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

Rohit Kundu, Hao Xiong, Vishal Mohanty et al.

CVPR 2025arXiv:2412.12278
13
citations

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

Gihyun Kwon, Jong Chul YE

ICLR 2025arXiv:2410.05591
11
citations

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.

CVPR 2025arXiv:2412.14167
75
citations

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Dohun Lee, Bryan Sangwoo Kim, Geon Yeong Park et al.

CVPR 2025arXiv:2410.04364
2
citations

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.

CVPR 2025arXiv:2503.21781
7
citations

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NEURIPS 2025arXiv:2503.01739
11
citations

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.

ICCV 2025arXiv:2503.20491
13
citations

WISA: World simulator assistant for physics-aware text-to-video generation

Jing Wang, Ao Ma, Ke Cao et al.

NEURIPS 2025spotlightarXiv:2503.08153
35
citations

ZeroPatcher: Training-free Sampler for Video Inpainting and Editing

Shaoshu Yang, Yingya Zhang, Ran He

NEURIPS 2025

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing HE, Xiaodong Cun et al.

AAAI 2024paperarXiv:2304.01186
284
citations

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Tianxing Wu, Chenyang Si, Yuming Jiang et al.

ECCV 2024arXiv:2312.07537
81
citations

Grid Diffusion Models for Text-to-Video Generation

Taegyeong Lee, Soyeong Kwon, Taehwan Kim

CVPR 2024arXiv:2404.00234
21
citations

MoVideo: Motion-Aware Video Generation with Diffusion Models

Jingyun Liang, Yuchen Fan, Kai Zhang et al.

ECCV 2024arXiv:2311.11325
14
citations

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024arXiv:2312.06662
278
citations

SAVE: Protagonist Diversification with Structure Agnostic Video Editing

Yeji Song, Wonsik Shin, Junsoo Lee et al.

ECCV 2024arXiv:2312.02503
11
citations

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

ECCV 2024arXiv:2311.16933
173
citations

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

CVPR 2024arXiv:2404.16306
22
citations