"multimodal generation" Papers

14 papers found

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Kaihang Pan, Wang Lin, Zhongqi Yue et al.

CVPR 2025arXiv:2504.14666
20
citations

Generator Matching: Generative modeling with arbitrary Markov processes

Peter Holderrieth, Marton Havasi, Jason Yim et al.

ICLR 2025arXiv:2410.20587
46
citations

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models

Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.

CVPR 2025arXiv:2503.19157
12
citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188
86
citations

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

Gao Peng, Le Zhuo, Dongyang Liu et al.

ICLR 2025oral
9
citations

PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation

Ao Wang, Hui Chen, Jianchao Tan et al.

NEURIPS 2025arXiv:2412.03409
6
citations

RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text

Jiaben Chen, Xin Yan, Yihang Chen et al.

ICCV 2025arXiv:2405.20336
3
citations

Show-o2: Improved Native Unified Multimodal Models

Jinheng Xie, Zhenheng Yang, Mike Zheng Shou

NEURIPS 2025oralarXiv:2506.15564
106
citations

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Jinheng Xie, Weijia Mao, Zechen Bai et al.

ICLR 2025arXiv:2408.12528
484
citations

UniMuMo: Unified Text, Music, and Motion Generation

Han Yang, Kun Su, Yutong Zhang et al.

AAAI 2025paperarXiv:2410.04534
12
citations

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025arXiv:2406.04321
32
citations

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Zehan Wang, Ziang Zhang, xize cheng et al.

ICML 2024arXiv:2405.04883
19
citations

Instant 3D Human Avatar Generation using Image Diffusion Models

Nikos Kolotouros, Thiemo Alldieck, Enric Corona et al.

ECCV 2024arXiv:2406.07516
15
citations

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

Guangyi Liu, Yu Wang, Zeyu Feng et al.

ICML 2024arXiv:2402.19009
8
citations