"action recognition" Papers

25 papers found

ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition

Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah

ICLR 2025arXiv:2502.00156
3
citations

EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception

Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.

ICCV 2025arXiv:2506.21080
2
citations

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Ege Özsoy, Arda Mamur, Felix Tristram et al.

NEURIPS 2025arXiv:2505.24287
5
citations

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders

Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.

NEURIPS 2025oralarXiv:2510.26027
1
citations

From Image to Video: An Empirical Study of Diffusion Representations

Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.

ICCV 2025highlightarXiv:2502.07001
4
citations

H-MoRe: Learning Human-centric Motion Representation for Action Analysis

Zhanbo Huang, Xiaoming Liu, Yu Kong

CVPR 2025highlightarXiv:2504.10676
4
citations

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

Jingyi Yang, Zitong YU, Nixiuming et al.

ICLR 2025oralarXiv:2502.03549
3
citations

MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval

Huaying Yuan, Jian Ni, Zheng Liu et al.

NEURIPS 2025arXiv:2502.12558
3
citations

OSKAR: Omnimodal Self-supervised Knowledge Abstraction and Representation

Mohamed Abdelfattah, Kaouther Messaoud, Alexandre Alahi

NEURIPS 2025

PASS: Path-selective State Space Model for Event-based Recognition

Jiazhou Zhou, Kanghao Chen, Lei Zhang et al.

NEURIPS 2025oralarXiv:2409.16953
1
citations

Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues

Sihong Huang, Jiaxin Wu, Xiaoyong Wei et al.

CVPR 2025
2
citations

TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition

yilong wang, Zilin Gao, Qilong Wang et al.

CVPR 2025arXiv:2411.19041
3
citations

VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention

Jiangning Wei, Lixiong Qin, Bo Yu et al.

AAAI 2025paperarXiv:2503.11004
5
citations

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Qi Wang, Zhou Xu, Yuming Lin et al.

ECCV 2024arXiv:2407.05106
16
citations

Data Collection-free Masked Video Modeling

Yuchi Ishikawa, Masayoshi Kondo, Yoshimitsu Aoki

ECCV 2024arXiv:2409.06665
1
citations

Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li, Xingao Li, Changxing Ding et al.

CVPR 2024arXiv:2404.01725
11
citations

Generative Model-Based Feature Knowledge Distillation for Action Recognition

Guiqin Wang, Peng Zhao, Yanjiang Shi et al.

AAAI 2024paperarXiv:2312.08644
7
citations

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346
64
citations

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li, Aditya Grover, Harkanwar Singh

ECCV 2024arXiv:2402.05892
106
citations

Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild

Lingni Ma, Yuting Ye, Rowan Postyeni et al.

ECCV 2024

Rethinking Image-to-Video Adaptation: An Object-centric Perspective

Rui Qian, Shuangrui Ding, Dahua Lin

ECCV 2024arXiv:2407.06871
8
citations

Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

Xueyuan Yang, Chao Yao, Xiaojuan Ban

AAAI 2024paperarXiv:2401.05412
4
citations

Taylor Videos for Action Recognition

Lei Wang, Xiuyuan Yuan, Tom Gedeon et al.

ICML 2024oralarXiv:2402.03019
13
citations

Text-Guided Video Masked Autoencoder

David Fan, Jue Wang, Shuai Liao et al.

ECCV 2024arXiv:2408.00759
7
citations

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer

Linglin Jing, Ying Xue, Xu Yan et al.

AAAI 2024paperarXiv:2312.07378
11
citations