Poster "cross-modal attention" Papers
4 papers found
Conference
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.
ICCV 2025arXiv:2508.07519
5
citations
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
ICCV 2025arXiv:2509.18733
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
ICML 2024arXiv:2406.01210
51
citations
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li et al.
ECCV 2024arXiv:2407.10937
11
citations