"video-language benchmarks" Papers
3 papers found
Conference
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
Shuming Liu, Chen Zhao, Tianqi Xu et al.
CVPR 2025arXiv:2503.21483
28
citations
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.
CVPR 2025highlightarXiv:2503.05936
4
citations
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
CVPR 2024arXiv:2401.06129
21
citations