"video representation learning" Papers
18 papers found
Conference
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei, Yifei Huang, Jilan Xu et al.
MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
Mohammadreza Salehi, Shashanka Venkataramanan, Ioana Simion et al.
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu et al.
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Haichao Zhang, Yun Fu
What Do Latent Action Models Actually Learn?
Chuheng Zhang, Tim Pearce, Pushi Zhang et al.
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
Yang Liu, Qianqian Xu, Peisong Wen et al.
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Sunil Hwang, Jaehong Yoon, Youngwan Lee et al.
Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
Minh Tran, Yelin Kim, Che-Chun Su et al.
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang, Zihao Xiao, Shikun Li et al.
Sequential Disentanglement by Extracting Static Information From A Single Sequence Element
Nimrod Berman, Ilan Naiman, Idan Arbiv et al.
Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
Mathieu Simon, Pascal Frossard, Christophe De Vleeschouwer
STSP: Spatial-Temporal Subspace Projection for Video Class-incremental Learning
Hao CHENG, SIYUAN YANG, Chong Wang et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad