"spatial-temporal understanding" Papers
2 papers found
Conference
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
Tianyu Fu, Tengxuan Liu, Qinghao Han et al.
ICCV 2025arXiv:2501.01986
24
citations
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yun Li, Yiming Zhang, Tao Lin et al.
ICCV 2025arXiv:2503.23765
38
citations