"action recognition" Papers
25 papers found
Conference
ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
Ege Özsoy, Arda Mamur, Felix Tristram et al.
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
Leibniz University Hannover, L3S Research Center Ali Rasekh, Erfan Soula, Omid Daliran et al.
From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.
H-MoRe: Learning Human-centric Motion Representation for Action Analysis
Zhanbo Huang, Xiaoming Liu, Yu Kong
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
Jingyi Yang, Zitong YU, Nixiuming et al.
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
Huaying Yuan, Jian Ni, Zheng Liu et al.
OSKAR: Omnimodal Self-supervised Knowledge Abstraction and Representation
Mohamed Abdelfattah, Kaouther Messaoud, Alexandre Alahi
PASS: Path-selective State Space Model for Event-based Recognition
Jiazhou Zhou, Kanghao Chen, Lei Zhang et al.
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues
Sihong Huang, Jiaxin Wu, Xiaoyong Wei et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention
Jiangning Wei, Lixiong Qin, Bo Yu et al.
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Qi Wang, Zhou Xu, Yuming Lin et al.
Data Collection-free Masked Video Modeling
Yuchi Ishikawa, Masayoshi Kondo, Yoshimitsu Aoki
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li, Xingao Li, Changxing Ding et al.
Generative Model-Based Feature Knowledge Distillation for Action Recognition
Guiqin Wang, Peng Zhao, Yanjiang Shi et al.
Koala: Key Frame-Conditioned Long Video-LLM
Reuben Tan, Ximeng Sun, Ping Hu et al.
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li, Aditya Grover, Harkanwar Singh
Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
Lingni Ma, Yuting Ye, Rowan Postyeni et al.
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian, Shuangrui Ding, Dahua Lin
Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics
Xueyuan Yang, Chao Yao, Xiaojuan Ban
Taylor Videos for Action Recognition
Lei Wang, Xiuyuan Yuan, Tom Gedeon et al.
Text-Guided Video Masked Autoencoder
David Fan, Jue Wang, Shuai Liao et al.
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
Linglin Jing, Ying Xue, Xu Yan et al.