"temporal modeling" Papers
25 papers found
Conference
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
CSV-Occ: Fusing Multi-frame Alignment for Occupancy Prediction with Temporal Cross State Space Model and Central Voting Mechanism
Ziming Zhu, Yu Zhu, Jiahao Chen et al.
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
Darryl Ho, Samuel Madden
Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking
Hyunseop Kim, Juheon Jeong, Hanul Kim et al.
Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
Jun Li, Jinpeng Wang, Chaolei Tan et al.
FLAME: Fast Long-context Adaptive Memory for Event-based Vision
Biswadeep Chakraborty, Saibal Mukhopadhyay
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
Jingyi Yang, Zitong YU, Nixiuming et al.
M-Net: MRI Brain Tumor Sequential Segmentation Network via Mesh-Cast
Jiacheng Lu, Hui Ding, Shiyu Zhang et al.
Robust Tracking via Mamba-based Context-aware Token Learning
Jinxia Xie, Bineng Zhong, Qihua Liang et al.
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Sicheng Shen, Dongcheng Zhao, Linghao Feng et al.
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang, Shuo Wang, Jiangning Zhang et al.
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
Unhackable Temporal Reward for Scalable Video MLLMs
En Yu, Kangheng Lin, Liang Zhao et al.
VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention
Jiangning Wei, Lixiong Qin, Bo Yu et al.
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li et al.
ViLLa: Video Reasoning Segmentation with Large Language Model
rongkun Zheng, Lu Qi, Xi Chen et al.
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang, Haozhe Xie, Hongxun Yao
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang et al.
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng, Mingfei Han, Haoyu He et al.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid et al.
Open Vocabulary Multi-Label Video Classification
Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian, Shuangrui Ding, Dahua Lin
Stream Query Denoising for Vectorized HD-Map Construction
Shuo Wang, Fan Jia, Weixin Mao et al.
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva, Fadime Sener, Edoardo Remelli et al.
ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
Xinhao Li, Yuhan Zhu, Limin Wang