FineGym¶

Modality: RGB video (broadcast sports footage)
Primary Tasks: Fine-grained action recognition, temporal localization, event detection
Scale: 32,000 video segments, 300 gymnastics events, multi-level annotations
License: Research-only; follows YouTube/host platform terms
Access: https://sdolivia.github.io/FineGym/

Summary¶

FineGym focuses on Olympic gymnastics with hierarchical annotations capturing routines, sub-actions, and body movements. Its structured labels facilitate fine-grained recognition and temporal parsing, making it a benchmark for high-resolution sports understanding and compositional action modeling.

Reference Paper¶

Qiuhong Shao et al. "FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding." CVPR, 2020. PDF

Benchmarks & Baselines¶

TSM + Hierarchical Parsing - Top-1 Event: 86.2, Action: 79.5; Shao et al., 2020.
MS-TCN - Temporal localization mAP@0.5: 60.3; applied to FineGym splits.
Evaluation uses train/val/test splits for floor, vault, uneven bars, and balance beam events.

Tooling & Ecosystem¶

Official FineGym toolkit for data parsing and annotation alignment.
TransFG demonstrates transformer-based fine-grained recognition using FineGym.
MMAction2 integrates FineGym dataloaders and configs.

Known Challenges¶

Licensing inherits YouTube policies; download links can expire, requiring manual refresh.
Class distribution is imbalanced across apparatus types and sub-actions.
Temporal annotations require precise start/end alignment; ensure consistent frame rates.

Cite¶

@inproceedings{shao2020finegym,
  title     = {FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding},
  author    = {Shao, Qiuhong and Zhang, Junjie and Wu, Zhizhong and others},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020}
}