"cross-modal attention" Papers
5 papers found
Conference
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.
ICCV 2025arXiv:2508.07519
5
citations
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
ICCV 2025arXiv:2509.18733
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
ICML 2024arXiv:2406.01210
51
citations
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li et al.
ECCV 2024arXiv:2407.10937
11
citations
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Siyu Zou, Jiji Tang, Yiyi Zhou et al.
AAAI 2024paperarXiv:2401.07709
20
citations