Dima Damen

papers

2,982

total citations

papers (21)

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022arXiv

1,511

citations

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

CVPR 2024arXiv

343

citations

What Can a Cook in Italy Teach a Mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

ICCV 2023arXiv

citations

Use Your Head: Improving Long-Tail Video Recognition

CVPR 2023arXiv

citations

UnweaveNet: Unweaving Activity Stories

CVPR 2022arXiv

citations

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

CVPR 2023arXiv

citations

Learning from One Continuous Video Stream

CVPR 2024arXiv

citations

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

CVPR 2025arXiv

citations

Learning from Streaming Video with Orthogonal Gradients

CVPR 2025arXiv

citations

Context-Aware Multimodal Pretraining

CVPR 2025arXiv

citations

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

CVPR 2024

citations

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

NEURIPS 2023

citations

Dima Damen

papers (21)

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Egocentric Video-Language Pretraining

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

On Semantic Similarity in Video Retrieval

EPIC Fields: Marrying 3D Geometry and Video Understanding

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Action Modifiers: Learning From Adverbs in Instructional Videos

TIM: A Time Interval Machine for Audio-Visual Action Recognition

What Can a Cook in Italy Teach a Mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

Use Your Head: Improving Long-Tail Video Recognition

UnweaveNet: Unweaving Activity Stories

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

Learning from One Continuous Video Stream

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

Learning from Streaming Video with Orthogonal Gradients

Context-Aware Multimodal Pretraining

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

papers (21)

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Egocentric Video-Language Pretraining

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

On Semantic Similarity in Video Retrieval

EPIC Fields: Marrying 3D Geometry and Video Understanding

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Action Modifiers: Learning From Adverbs in Instructional Videos

TIM: A Time Interval Machine for Audio-Visual Action Recognition

What Can a Cook in Italy Teach a Mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

Use Your Head: Improving Long-Tail Video Recognition

UnweaveNet: Unweaving Activity Stories

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

Learning from One Continuous Video Stream

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

Learning from Streaming Video with Orthogonal Gradients

Context-Aware Multimodal Pretraining

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Perception Test: A Diagnostic Benchmark for Multimodal Video Models