"temporal modeling" Papers

25 papers found

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Wenhao Chai, Enxin Song, Yilun Du et al.

ICLR 2025oralarXiv:2410.03051
105
citations

CSV-Occ: Fusing Multi-frame Alignment for Occupancy Prediction with Temporal Cross State Space Model and Central Voting Mechanism

Ziming Zhu, Yu Zhu, Jiahao Chen et al.

ICML 2025oral

DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification

Darryl Ho, Samuel Madden

CVPR 2025arXiv:2506.12585

Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking

Hyunseop Kim, Juheon Jeong, Hanul Kim et al.

NEURIPS 2025oral

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Jun Li, Jinpeng Wang, Chaolei Tan et al.

ICCV 2025arXiv:2507.17402
4
citations

FLAME: Fast Long-context Adaptive Memory for Event-based Vision

Biswadeep Chakraborty, Saibal Mukhopadhyay

NEURIPS 2025oral

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

Jingyi Yang, Zitong YU, Nixiuming et al.

ICLR 2025oralarXiv:2502.03549
3
citations

M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast

Jiacheng Lu, Hui Ding, Shiyu Zhang et al.

ICCV 2025arXiv:2507.20582
1
citations

Robust Tracking via Mamba-based Context-aware Token Learning

Jinxia Xie, Bineng Zhong, Qihua Liang et al.

AAAI 2025paperarXiv:2412.13611
26
citations

STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

Sicheng Shen, Dongcheng Zhao, Linghao Feng et al.

NEURIPS 2025oralarXiv:2505.11151
3
citations

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

Yabiao Wang, Shuo Wang, Jiangning Zhang et al.

CVPR 2025arXiv:2408.17135
9
citations

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abduljalil Radman, Jorma Laaksonen

CVPR 2025
6
citations

Unhackable Temporal Reward for Scalable Video MLLMs

En Yu, Kangheng Lin, Liang Zhao et al.

ICLR 2025oralarXiv:2502.12081
22
citations

VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

Jiangning Wei, Lixiong Qin, Bo Yu et al.

AAAI 2025paperarXiv:2503.11004
5
citations

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NEURIPS 2025oralarXiv:2503.21776
257
citations

ViLLa: Video Reasoning Segmentation with Large Language Model

rongkun Zheng, Lu Qi, Xi Chen et al.

ICCV 2025arXiv:2407.14500
17
citations

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Huicong Zhang, Haozhe Xie, Hongxun Yao

CVPR 2024arXiv:2406.07551
18
citations

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang et al.

ECCV 2024arXiv:2402.14780
48
citations

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024arXiv:2404.03384
131
citations

Motion Mamba: Efficient and Long Sequence Motion Generation

Zeyu Zhang, Akide Liu, Ian Reid et al.

ECCV 2024arXiv:2403.07487
114
citations

Open Vocabulary Multi-Label Video Classification

Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.

ECCV 2024arXiv:2407.09073
5
citations

Rethinking Image-to-Video Adaptation: An Object-centric Perspective

Rui Qian, Shuangrui Ding, Dahua Lin

ECCV 2024arXiv:2407.06871
8
citations

Stream Query Denoising for Vectorized HD-Map Construction

Shuo Wang, Fan Jia, Weixin Mao et al.

ECCV 2024arXiv:2401.09112
42
citations

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

Anna Kukleva, Fadime Sener, Edoardo Remelli et al.

CVPR 2024arXiv:2403.19811
5
citations

ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video

Xinhao Li, Yuhan Zhu, Limin Wang

ECCV 2024arXiv:2310.01324
19
citations