"visual-language integration" Papers
3 papers found
Conference
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li, Yifan Jiao, Dan Meng et al.
ICCV 2025arXiv:2503.08145
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Jingli Lin, Chenming Zhu, Runsen Xu et al.
NEURIPS 2025oralarXiv:2507.07984
7
citations
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar, Yongqin Xian, Alessio Tonioni et al.
ECCV 2024arXiv:2312.11897
24
citations