"spatial-temporal reasoning" Papers
4 papers found
Conference
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
CVPR 2025arXiv:2408.00754
9
citations
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang, Guikun Chen, Xiaodi Li et al.
ICML 2024oralarXiv:2401.08392
64
citations
FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou et al.
ECCV 2024arXiv:2306.14899
36
citations
Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering
Haoyu Zhang, Meng Liu, Zixin Liu et al.
ICML 2024oral