"multi-modal video understanding" Papers
2 papers found
Conference
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng, Jinrui Zhang, Qingni Wang et al.
CVPR 2025arXiv:2411.19772
34
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li, Yali Wang, Yinan He et al.
CVPR 2024highlightarXiv:2311.17005
902
citations