HAA500¶

Modality: RGB video (web-sourced clips)
Primary Tasks: Atomic action recognition, fine-grained action classification
Scale: 10,000+ video clips, 500 atomic action classes
License: Research use only
Access: https://www.cse.ust.hk/haa/

Summary¶

HAA500 (Human Atomic Actions 500) is a video dataset containing 500 classes of atomic human actions with approximately 10,000 manually curated clips. Unlike datasets with composite or complex activities, HAA500 focuses on atomic actions — fundamental units of human movement that cannot be further decomposed (e.g., "raise left hand", "kick with right foot", "turn head left"). The fine-grained, atomic nature of the action vocabulary makes HAA500 particularly useful for studying action compositionality, few-shot action recognition, and building hierarchical action understanding systems. Each class contains roughly 20 clips, emphasizing diversity over quantity.

Reference Paper¶

Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang. "HAA500: Human-Centric Atomic Action Dataset with Curated Videos." ICCV, 2021. PDF

Benchmarks & Baselines¶

TSM - Top-1: ~48% — Chung et al., ICCV 2021.
SlowFast R50 - Top-1: ~52% — Chung et al., ICCV 2021.
Few-shot (5-way 5-shot): ~65% — reported with ProtoNet baseline.
Standard evaluation uses the official train/val/test splits; top-1 and top-5 accuracy are reported.

Tooling & Ecosystem¶

Official project page provides video URLs, annotations, and download scripts.
HAA500 GitHub contains data preparation and baseline code.
Compatible with standard video classification frameworks (MMAction2, PySlowFast, TimeSformer).

Known Challenges¶

500 classes with only ~20 clips each creates a challenging low-data regime per class.
Atomic action classes can be visually very similar (e.g., "raise left hand" vs "raise right hand"), requiring fine-grained spatial reasoning.
Web-sourced videos vary in quality, resolution, and background.
Class taxonomy is manually defined; some atomic actions may be debatable in their "atomic" status.
Small number of clips per class makes the dataset more suited for few-shot or transfer learning than standard large-scale training.

Cite¶

@inproceedings{chung2021haa500,
  title     = {HAA500: Human-Centric Atomic Action Dataset with Curated Videos},
  author    = {Chung, Jihoon and Wuu, Cheng-hsin and Yang, Hsuan-ru and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle = {IEEE International Conference on Computer Vision (ICCV)},
  year      = {2021}
}