"cross-modal generation" Papers
5 papers found
Conference
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
CVPR 2025highlightarXiv:2412.15213
12
citations
HMVLM:Human Motion-Vision-Language Model via MoE LoRA
Lei Hu, Yongjing Ye, Shihong Xia
NEURIPS 2025
UniMuMo: Unified Text, Music, and Motion Generation
Han Yang, Kun Su, Yutong Zhang et al.
AAAI 2025paperarXiv:2410.04534
12
citations
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
ICLR 2025arXiv:2410.10594
127
citations
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual et al.
AAAI 2024paperarXiv:2308.09300
75
citations