Poster "cross-modal supervision" Papers
2 papers found
Conference
OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels
Dongsheng Yuan, Xie Zhang, Weiying Hou et al.
NEURIPS 2025
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Ganlong Zhao, Guanbin Li, Weikai Chen et al.
CVPR 2024arXiv:2403.17334
15
citations