by Qi Xiaojuan Papers
6 papers found
Conference
Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu, Xin Wen, Shizhen Zhao et al.
ECCV 2024arXiv:2409.05162
13
citations
EA-VTR: Event-Aware Video-Text Retrieval
Zongyang Ma, Ziqi Zhang, Yuxin Chen et al.
ECCV 2024arXiv:2407.07478
7
citations
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang, Jiannan Wu et al.
ECCV 2024arXiv:2404.13013
107
citations
Let the Avatar Talk using Texts without Paired Training Data
Xiuzhe Wu, Yang-Tian Sun, Handi Chen et al.
ECCV 2024
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin et al.
ECCV 2024arXiv:2312.08754
51
citations
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan YANG, Runyu Ding, Ellis L Brown et al.
ECCV 2024arXiv:2402.03310
36
citations