"multi-modal inputs" Papers
4 papers found
Conference
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu, Yi Jiang, Qihao Liu et al.
CVPR 2024highlightarXiv:2312.09158
82
citations
Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
Aayam Shrestha, Pan Liu, German Ros et al.
ECCV 2024arXiv:2502.05641
10
citations
Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.
CVPR 2024arXiv:2404.11699
28
citations
Unleashing Network Potentials for Semantic Scene Completion
Fengyun Wang, Qianru Sun, Dong Zhang et al.
CVPR 2024arXiv:2403.07560
5
citations