"video caption generation" Papers
4 papers found
Conference
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos, Cordelia Schmid, Josef Sivic
ICCV 2025arXiv:2503.10781
3
citations
SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living
Arkaprava Sinha, Dominick Reilly, Francois Bremond et al.
AAAI 2025paperarXiv:2502.03459
2
citations
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
Zixuan Xiong, Guangwei Xu, wenkai zhang et al.
ICLR 2025
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
CVPR 2024arXiv:2401.06129
21
citations