"video-language understanding" Papers
6 papers found
Conference
Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding
Xiaolong Sun, Liushuai Shi, Le Wang et al.
AAAI 2025paperarXiv:2406.00143
5
citations
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
CVPR 2025arXiv:2503.02175
57
citations
How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho et al.
ICCV 2025arXiv:2504.07454
3
citations
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang, Zhengxuan Wei, Yuchen Zhu et al.
ICCV 2025arXiv:2509.23867
2
citations
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.
CVPR 2025arXiv:2412.18609
2
citations
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li et al.
CVPR 2024highlightarXiv:2309.01327
113
citations