by Yansong Shi Papers
2 papers found
Conference
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng, Kunchang Li, Chenting Wang et al.
ICLR 2025oralarXiv:2410.19702
67
citations
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Yi Wang, Kunchang Li, Xinhao Li et al.
ECCV 2024arXiv:2403.15377
236
citations