"temporal reasoning" Papers

20 papers found

Contrastive Representations for Temporal Reasoning

Alicja Ziarko, Michał Bortkiewicz, Michał Zawalski et al.

NEURIPS 2025oralarXiv:2508.13113
3
citations

DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation

Mu Chen, Liulei Li, Wenguan Wang et al.

CVPR 2025arXiv:2503.13957
5
citations

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs

Yuping He, Yifei Huang, Guo Chen et al.

NEURIPS 2025oralarXiv:2507.18342
11
citations

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders

Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.

NEURIPS 2025oralarXiv:2510.26027
1
citations

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding

Zijun Lin, Shuting He, Cheston Tan et al.

ICCV 2025arXiv:2506.21188
2
citations

HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos

Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

ICCV 2025arXiv:2505.12911
1
citations

Improve Temporal Reasoning in Multimodal Large Language Models via Video Contrastive Decoding

Daiqing Qi, Dongliang Guo, Hanzhang Yuan et al.

NEURIPS 2025oral

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu et al.

ICLR 2025oralarXiv:2410.10813
128
citations

Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames

Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.

NEURIPS 2025oralarXiv:2507.02001
9
citations

Temporal Reasoning Transfer from Text to Video

Lei Li, Yuanxin Liu, Linli Yao et al.

ICLR 2025oralarXiv:2410.06166
21
citations

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Jeremy Irvin, Emily Liu, Joyce Chen et al.

ICLR 2025oralarXiv:2410.06234
45
citations

Two Causally Related Needles in a Video Haystack

Miaoyu Li, Qin Chao, Boyang Li

NEURIPS 2025arXiv:2505.19853

VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution

Limeng Qiao, Yiyang Gan, Bairui Wang et al.

NEURIPS 2025oral
3
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025arXiv:2506.10857
9
citations

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision

Kibum Kim, Kanghoon Yoon, Yeonjun In et al.

ICLR 2025oralarXiv:2502.15370
2
citations

Generalized Predictive Model for Autonomous Driving

Jiazhi Yang, Shenyuan Gao, Yihang Qiu et al.

CVPR 2024highlightarXiv:2403.09630
128
citations

History Matters: Temporal Knowledge Editing in Large Language Model

Xunjian Yin, Jin Jiang, Liming Yang et al.

AAAI 2024paperarXiv:2312.05497
16
citations

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Long Qian, Juncheng Li, Yu Wu et al.

ICML 2024oralarXiv:2402.11435
104
citations

RMem: Restricted Memory Banks Improve Video Object Segmentation

Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

CVPR 2024arXiv:2406.08476
20
citations

Towards Neuro-Symbolic Video Understanding

Minkyu Choi, Harsh Goel, Mohammad Omama et al.

ECCV 2024arXiv:2403.11021
19
citations