Skip to content

HAA500

  • Modality: RGB video (web-sourced clips)
  • Primary Tasks: Atomic action recognition, fine-grained action classification
  • Scale: 10,000+ video clips, 500 atomic action classes
  • License: Research use only
  • Access: https://www.cse.ust.hk/haa/

Summary

HAA500 (Human Atomic Actions 500) is a video dataset containing 500 classes of atomic human actions with approximately 10,000 manually curated clips. Unlike datasets with composite or complex activities, HAA500 focuses on atomic actions — fundamental units of human movement that cannot be further decomposed (e.g., "raise left hand", "kick with right foot", "turn head left"). The fine-grained, atomic nature of the action vocabulary makes HAA500 particularly useful for studying action compositionality, few-shot action recognition, and building hierarchical action understanding systems. Each class contains roughly 20 clips, emphasizing diversity over quantity.

Reference Paper

  • Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang. "HAA500: Human-Centric Atomic Action Dataset with Curated Videos." ICCV, 2021. PDF

Benchmarks & Baselines

  • TSM - Top-1: ~48% — Chung et al., ICCV 2021.
  • SlowFast R50 - Top-1: ~52% — Chung et al., ICCV 2021.
  • Few-shot (5-way 5-shot): ~65% — reported with ProtoNet baseline.
  • Standard evaluation uses the official train/val/test splits; top-1 and top-5 accuracy are reported.

Tooling & Ecosystem

  • Official project page provides video URLs, annotations, and download scripts.
  • HAA500 GitHub contains data preparation and baseline code.
  • Compatible with standard video classification frameworks (MMAction2, PySlowFast, TimeSformer).

Known Challenges

  • 500 classes with only ~20 clips each creates a challenging low-data regime per class.
  • Atomic action classes can be visually very similar (e.g., "raise left hand" vs "raise right hand"), requiring fine-grained spatial reasoning.
  • Web-sourced videos vary in quality, resolution, and background.
  • Class taxonomy is manually defined; some atomic actions may be debatable in their "atomic" status.
  • Small number of clips per class makes the dataset more suited for few-shot or transfer learning than standard large-scale training.

Cite

@inproceedings{chung2021haa500,
  title     = {HAA500: Human-Centric Atomic Action Dataset with Curated Videos},
  author    = {Chung, Jihoon and Wuu, Cheng-hsin and Yang, Hsuan-ru and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle = {IEEE International Conference on Computer Vision (ICCV)},
  year      = {2021}
}