α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Gedas Bertasius
Gedas Bertasius
22
papers
1,951
total citations
papers (22)
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
arXiv
343
citations
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
CVPR 2020
arXiv
191
citations
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
CVPR 2025
arXiv
156
citations
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
ICCV 2023
arXiv
152
citations
Long Movie Clip Classification with State-Space Video Models
ECCV 2022
arXiv
141
citations
TALLFormer: Temporal Action Localization with a Long-Memory Transformer
ECCV 2022
arXiv
121
citations
Vision Transformers Are Parameter-Efficient Audio-Visual Learners
CVPR 2023
arXiv
112
citations
Learning To Recognize Procedural Activities With Distant Supervision
CVPR 2022
arXiv
98
citations
VindLU: A Recipe for Effective Video-and-Language Pretraining
CVPR 2023
arXiv
92
citations
Video ReCap: Recursive Captioning of Hour-Long Videos
CVPR 2024
arXiv
85
citations
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
ICCV 2023
arXiv
78
citations
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021
arXiv
74
citations
Efficient Movie Scene Detection Using State-Space Transformers
CVPR 2023
arXiv
70
citations
Long-Short Temporal Contrastive Learning of Video Transformers
CVPR 2022
arXiv
56
citations
ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound
ECCV 2022
arXiv
56
citations
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
NEURIPS 2025
arXiv
29
citations
LoCoNet: Long-Short Context Network for Active Speaker Detection
CVPR 2024
arXiv
28
citations
COBE: Contextualized Object Embeddings from Narrated Instructional Video
NEURIPS 2020
arXiv
27
citations
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
CVPR 2025
arXiv
12
citations
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
CVPR 2025
arXiv
11
citations
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
ECCV 2024
arXiv
10
citations
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
CVPR 2025
arXiv
9
citations