Ego-Exo4D¶

Modality: Synchronized egocentric and exocentric RGB video, audio, motion capture, text transcripts
Primary Tasks: Cross-view action understanding, third-person to first-person translation, 4D reconstruction
Scale: 1,422 sequences, 20+ hours ego video, 120+ hours exo video, 40 action categories
License: Research license (non-commercial); requires acceptance through dataset agreement
Access: https://ego-exo4d-data.org/

Summary¶

Ego-Exo4D presents paired first- and third-person views of skilled human activities with synchronized audio and motion capture. The dataset enables cross-view domain adaptation, egocentric-exocentric translation, and holistic 4D reasoning about interaction-intensive tasks (e.g., cooking, musical performance).

Reference Paper¶

Hanbyul Joo et al. "Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-person Perspectives." arXiv, 2023. PDF

Benchmarks & Baselines¶

Cross-view Action Recognition Baseline - Top-1 exo-to-ego: 46.5; Joo et al., 2023.
Pose Estimation with Motion Capture Supervision - MPJPE: 28.6 mm for ego/exo fusion.
Tasks include cross-view action classification, 4D pose reconstruction, and audio-visual alignment; official metrics described in the paper.

Tooling & Ecosystem¶

Official ego-exo4d toolkit for download, preprocessing, and baseline models.
Integration examples for PyTorch3D and Detectron2 provided.
Compatible with Ego4D metadata schemas for multi-dataset experimentation.

Known Challenges¶

Large data volume and multi-camera synchronization demand significant storage and careful handling of timestamps.
Licensing prohibits commercial use and redistributing raw footage; review terms before derivative releases.
Motion capture coverage varies by sequence; some activities have partial mocap data.

Cite¶

@article{joo2023egoexo4d,
  title   = {Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-person Perspectives},
  author  = {Joo, Hanbyul and Sharma, Gaurav and Vo, Minh and others},
  journal = {arXiv preprint arXiv:2306.08639},
  year    = {2023}
}