EPIC-Kitchens-100¶

Modality: Egocentric RGB video, multi-channel audio, narrative annotations
Primary Tasks: Action recognition (verb/noun/interaction), anticipation, detection, audio-visual learning
Scale: 700+ hours, 100 kitchens, 45 million frames, 97 participants, 90 verb / 300 noun classes
License: Non-commercial research license; requires agreement with EPIC-Kitchens consortium
Access: https://epic-kitchens.github.io/2021

Summary¶

EPIC-Kitchens-100 extends the original EPIC-Kitchens dataset with additional hours of cooking activity, richer annotations, and new benchmarks. Recordings are captured with head-mounted cameras in home kitchens, providing fine-grained egocentric interactions and natural audio cues.

Reference Paper¶

Dima Damen et al. "EPIC-KITCHENS-100: Challenges for Egocentric Action Recognition." IJCV, 2022. PDF

Benchmarks & Baselines¶

TSN (RGB+Flow) - Top-1 verb: 65.1, noun: 45.3, action: 38.9; Damen et al., 2022.
Temporal Alignment Networks (TAN) - Action anticipation Top-5: 32.1; Miech et al., ECCV 2020.
Official leaderboards available on CodaLab for recognition, detection, and anticipation tasks.

Tooling & Ecosystem¶

EPIC-Kitchens Toolkit with metadata parsers, evaluation scripts, and download helpers.
MMAction2 includes configs for EPIC-Kitchens recognition and anticipation.
Narrations provide time-aligned verb/noun labels and bounding boxes.

Known Challenges¶

Verb/noun imbalance is significant; weighted losses or focal losses are common.
Audio quality varies; consider pre-processing for anticipation tasks.
Storage requirements (~2.2 TB) and download quotas necessitate selective fetching via toolkit.

Cite¶

@article{damen2022epickitchens100,
  title   = {EPIC-KITCHENS-100: Challenges for Egocentric Action Recognition},
  author  = {Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria and others},
  journal = {International Journal of Computer Vision},
  year    = {2022}
}