"video generation" Papers

104 papers found • Page 2 of 3

PlayerOne: Egocentric World Simulator

Yuanpeng Tu, Hao Luo, Xi Chen et al.

NEURIPS 2025oralarXiv:2506.09995
4
citations

Pyramidal Flow Matching for Efficient Video Generative Modeling

Yang Jin, Zhicheng Sun, Ningyuan Li et al.

ICLR 2025oralarXiv:2410.05954
227
citations

REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents

Rui Tian, Qi Dai, Jianmin Bao et al.

ICCV 2025arXiv:2411.13552
7
citations

Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape

Ruichen Chen, Keith Mills, Liyao Jiang et al.

NEURIPS 2025oralarXiv:2505.22918
1
citations

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

Tianyi Yan, Wencheng Han, xia zhou et al.

NEURIPS 2025arXiv:2509.16500
4
citations

RoboScape: Physics-informed Embodied World Model

Yu Shang, Xin Zhang, Yinzhou Tang et al.

NEURIPS 2025oralarXiv:2506.23135
18
citations

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Teng Hu, Jiangning Zhang, Ran Yi et al.

ICLR 2025arXiv:2409.06633
1
citations

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen et al.

NEURIPS 2025arXiv:2502.06734
27
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025arXiv:2506.21356
7
citations

Show-o2: Improved Native Unified Multimodal Models

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

NEURIPS 2025oralarXiv:2506.15564
106
citations

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Yining Hong, Beide Liu, Maxine Wu et al.

ICLR 2025oralarXiv:2410.23277
19
citations

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Shuning Chang, Pichao WANG, Jiasheng Tang et al.

NEURIPS 2025oralarXiv:2412.06028
3
citations

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
40
citations

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Shuyuan Tu, Zhen Xing, Xintong Han et al.

CVPR 2025arXiv:2411.17697
64
citations

Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation

Agneet Chatterjee, Rahim Entezari, Maksym Zhuravinskyi et al.

NEURIPS 2025arXiv:2509.26555

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Jensen Zhou, Hang Gao, Vikram Voleti et al.

ICCV 2025arXiv:2503.14489
87
citations

STDD: Spatio-Temporal Dual Diffusion for Video Generation

Shuaizhen Yao, Xiaoya Zhang, Xin Liu et al.

CVPR 2025
2
citations

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

Byeongjun Park, Hyojun Go, Hyelin Nam et al.

ICCV 2025arXiv:2503.12024
5
citations

SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Lingwei Dang, Ruizhi Shao, Hongwen Zhang et al.

NEURIPS 2025spotlightarXiv:2506.02444
3
citations

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization

Zhentao Tan, Ben Xue, Jian Jia et al.

ICCV 2025arXiv:2412.10443
6
citations

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.

CVPR 2025arXiv:2412.00719
7
citations

Taming Teacher Forcing for Masked Autoregressive Video Generation

Deyu Zhou, Quan Sun, Yuang Peng et al.

CVPR 2025arXiv:2501.12389
20
citations

TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation

Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.

CVPR 2025arXiv:2503.11423
22
citations

TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models

Haocheng Huang, Jiaxin Chen, Jinyang Guo et al.

AAAI 2025paperarXiv:2412.16700
3
citations

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Zhenghao Zhang, Junchao Liao, Menghao Li et al.

CVPR 2025arXiv:2407.21705
115
citations

Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach

Yunuo Chen, Junli Cao, Vidit Goel et al.

NEURIPS 2025arXiv:2502.03639
8
citations

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.

ICCV 2025arXiv:2411.17616
3
citations

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025arXiv:2412.06016
33
citations

Trajectory attention for fine-grained video motion control

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.

ICLR 2025oralarXiv:2411.19324
40
citations

UniScene: Unified Occupancy-centric Driving Scene Generation

Bohan Li, Jiazhe Guo, Hongsi Liu et al.

CVPR 2025arXiv:2412.05435
64
citations

VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers

Qinkai XU, yijin liu, YangChen et al.

NEURIPS 2025oral

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

Tianxiong Zhong, Xingye Tian, Boyuan Jiang et al.

NEURIPS 2025oralarXiv:2505.12053
3
citations

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Wentao Zhang, Junliang Guo, Tianyu He et al.

ICLR 2025arXiv:2407.07356
7
citations

VideoPhy: Evaluating Physical Commonsense for Video Generation

Hritik Bansal, Zongyu Lin, Tianyi Xie et al.

ICLR 2025arXiv:2406.03520
106
citations

Video-T1: Test-time Scaling for Video Generation

Fangfu Liu, Hanyang Wang, Yimo Cai et al.

ICCV 2025arXiv:2503.18942
20
citations

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.

NEURIPS 2025arXiv:2505.18809
12
citations

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations

ZeroPatcher: Training-free Sampler for Video Inpainting and Editing

Shaoshu Yang, Yingya Zhang, Ran He

NEURIPS 2025

BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering

Xinmin Qiu, Congying Han, Zicheng Zhang et al.

ECCV 2024arXiv:2403.06243

Boximator: Generating Rich and Controllable Motions for Video Synthesis

Jiawei Wang, Yuchen Zhang, Jiaxin Zou et al.

ICML 2024arXiv:2402.01566
82
citations

DNI: Dilutional Noise Initialization for Diffusion Video Editing

Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong et al.

ECCV 2024arXiv:2409.13037
10
citations

Explorative Inbetweening of Time and Space

Haiwen Feng, Zheng Ding, Zhihao Xia et al.

ECCV 2024arXiv:2403.14611
12
citations

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.

CVPR 2024arXiv:2312.01409
26
citations

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

Kihong Kim, Haneol Lee, Jihye Park et al.

ECCV 2024arXiv:2402.13729
11
citations

Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation

Aram Davtyan, Paolo Favaro

AAAI 2024paperarXiv:2306.03988
7
citations

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

Ziyao Huang, Fan Tang, Yong Zhang et al.

CVPR 2024arXiv:2403.16510
30
citations

MoVideo: Motion-Aware Video Generation with Diffusion Models

Jingyun Liang, Yuchen Fan, Kai Zhang et al.

ECCV 2024arXiv:2311.11325
14
citations

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024arXiv:2312.06662
278
citations

Position: Video as the New Language for Real-World Decision Making

Sherry Yang, Jacob C Walker, Jack Parker-Holder et al.

ICML 2024

RoboDreamer: Learning Compositional World Models for Robot Imagination

Siyuan Zhou, Yilun Du, Jiaben Chen et al.

ICML 2024arXiv:2404.12377
107
citations