α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Lorenzo Torresani
Lorenzo Torresani
26
papers
3,560
total citations
papers (26)
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
arXiv
1,511
citations
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
NEURIPS 2020
arXiv
462
citations
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
arXiv
343
citations
Listen to Look: Action Recognition by Previewing Audio
CVPR 2020
arXiv
285
citations
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
CVPR 2020
arXiv
191
citations
Video Modeling With Correlation Networks
CVPR 2020
arXiv
144
citations
Learning To Recognize Procedural Activities With Distant Supervision
CVPR 2022
arXiv
98
citations
Video ReCap: Recursive Captioning of Hour-Long Videos
CVPR 2024
arXiv
85
citations
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021
arXiv
74
citations
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR 2023
arXiv
74
citations
Long-Short Temporal Contrastive Learning of Video Transformers
CVPR 2022
arXiv
56
citations
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
NEURIPS 2025
arXiv
47
citations
Ego-Only: Egocentric Action Detection without Exocentric Transferring
ICCV 2023
arXiv
38
citations
Deformable Video Transformer
CVPR 2022
arXiv
32
citations
COBE: Contextualized Object Embeddings from Narrated Instructional Video
NEURIPS 2020
arXiv
27
citations
Learning to Ground Instructional Articles in Videos through Narrations
ICCV 2023
arXiv
27
citations
Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories
CVPR 2021
arXiv
23
citations
Egocentric Video Task Translation
CVPR 2023
arXiv
19
citations
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
CVPR 2025
arXiv
12
citations
Step Differences in Instructional Video
CVPR 2024
arXiv
10
citations
VITED: Video Temporal Evidence Distillation
CVPR 2025
arXiv
2
citations
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
ICCV 2025
arXiv
0
citations
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities
NEURIPS 2023
0
citations
HT-Step: Aligning Instructional Articles with How-To Videos
NEURIPS 2023
0
citations
Relational Space-Time Query in Long-Form Videos
CVPR 2023
0
citations
Learning to Segment Referred Objects from Narrated Egocentric Videos
CVPR 2024
0
citations