HAA500¶
- Modality: RGB video (web-sourced clips)
- Primary Tasks: Atomic action recognition, fine-grained action classification
- Scale: 10,000+ video clips, 500 atomic action classes
- License: Research use only
- Access: https://www.cse.ust.hk/haa/
Summary¶
HAA500 (Human Atomic Actions 500) is a video dataset containing 500 classes of atomic human actions with approximately 10,000 manually curated clips. Unlike datasets with composite or complex activities, HAA500 focuses on atomic actions — fundamental units of human movement that cannot be further decomposed (e.g., "raise left hand", "kick with right foot", "turn head left"). The fine-grained, atomic nature of the action vocabulary makes HAA500 particularly useful for studying action compositionality, few-shot action recognition, and building hierarchical action understanding systems. Each class contains roughly 20 clips, emphasizing diversity over quantity.
Reference Paper¶
- Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang. "HAA500: Human-Centric Atomic Action Dataset with Curated Videos." ICCV, 2021.
PDF
Benchmarks & Baselines¶
- TSM - Top-1: ~48% — Chung et al., ICCV 2021.
- SlowFast R50 - Top-1: ~52% — Chung et al., ICCV 2021.
- Few-shot (5-way 5-shot): ~65% — reported with ProtoNet baseline.
- Standard evaluation uses the official train/val/test splits; top-1 and top-5 accuracy are reported.
Tooling & Ecosystem¶
- Official project page provides video URLs, annotations, and download scripts.
- HAA500 GitHub contains data preparation and baseline code.
- Compatible with standard video classification frameworks (MMAction2, PySlowFast, TimeSformer).
Known Challenges¶
- 500 classes with only ~20 clips each creates a challenging low-data regime per class.
- Atomic action classes can be visually very similar (e.g., "raise left hand" vs "raise right hand"), requiring fine-grained spatial reasoning.
- Web-sourced videos vary in quality, resolution, and background.
- Class taxonomy is manually defined; some atomic actions may be debatable in their "atomic" status.
- Small number of clips per class makes the dataset more suited for few-shot or transfer learning than standard large-scale training.