EPIC-Kitchens-100¶
- Modality: Egocentric RGB video, multi-channel audio, narrative annotations
- Primary Tasks: Action recognition (verb/noun/interaction), anticipation, detection, audio-visual learning
- Scale: 700+ hours, 100 kitchens, 45 million frames, 97 participants, 90 verb / 300 noun classes
- License: Non-commercial research license; requires agreement with EPIC-Kitchens consortium
- Access: https://epic-kitchens.github.io/2021
Summary¶
EPIC-Kitchens-100 extends the original EPIC-Kitchens dataset with additional hours of cooking activity, richer annotations, and new benchmarks. Recordings are captured with head-mounted cameras in home kitchens, providing fine-grained egocentric interactions and natural audio cues.
Reference Paper¶
- Dima Damen et al. "EPIC-KITCHENS-100: Challenges for Egocentric Action Recognition." IJCV, 2022.
PDF
Benchmarks & Baselines¶
- TSN (RGB+Flow) - Top-1 verb: 65.1, noun: 45.3, action: 38.9; Damen et al., 2022.
- Temporal Alignment Networks (TAN) - Action anticipation Top-5: 32.1; Miech et al., ECCV 2020.
- Official leaderboards available on CodaLab for recognition, detection, and anticipation tasks.
Tooling & Ecosystem¶
- EPIC-Kitchens Toolkit with metadata parsers, evaluation scripts, and download helpers.
- MMAction2 includes configs for EPIC-Kitchens recognition and anticipation.
- Narrations provide time-aligned verb/noun labels and bounding boxes.
Known Challenges¶
- Verb/noun imbalance is significant; weighted losses or focal losses are common.
- Audio quality varies; consider pre-processing for anticipation tasks.
- Storage requirements (~2.2 TB) and download quotas necessitate selective fetching via toolkit.