"multimodal generation" Papers
14 papers found
Conference
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
CVPR 2025arXiv:2504.14666
20
citations
Generator Matching: Generative modeling with arbitrary Markov processes
Peter Holderrieth, Marton Havasi, Jason Yim et al.
ICLR 2025arXiv:2410.20587
46
citations
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.
CVPR 2025arXiv:2503.19157
12
citations
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
Weijia Shi, Xiaochuang Han, Chunting Zhou et al.
NEURIPS 2025arXiv:2412.15188
86
citations
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
ICLR 2025oral
9
citations
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang, Hui Chen, Jianchao Tan et al.
NEURIPS 2025arXiv:2412.03409
6
citations
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen, Xin Yan, Yihang Chen et al.
ICCV 2025arXiv:2405.20336
3
citations
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie, Zhenheng Yang, Mike Zheng Shou
NEURIPS 2025oralarXiv:2506.15564
106
citations
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie, Weijia Mao, Zechen Bai et al.
ICLR 2025arXiv:2408.12528
484
citations
UniMuMo: Unified Text, Music, and Motion Generation
Han Yang, Kun Su, Yutong Zhang et al.
AAAI 2025paperarXiv:2410.04534
12
citations
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.
CVPR 2025arXiv:2406.04321
32
citations
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Zehan Wang, Ziang Zhang, xize cheng et al.
ICML 2024arXiv:2405.04883
19
citations
Instant 3D Human Avatar Generation using Image Diffusion Models
Nikos Kolotouros, Thiemo Alldieck, Enric Corona et al.
ECCV 2024arXiv:2406.07516
15
citations
Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding
Guangyi Liu, Yu Wang, Zeyu Feng et al.
ICML 2024arXiv:2402.19009
8
citations