FineGym¶
- Modality: RGB video (broadcast sports footage)
- Primary Tasks: Fine-grained action recognition, temporal localization, event detection
- Scale: 32,000 video segments, 300 gymnastics events, multi-level annotations
- License: Research-only; follows YouTube/host platform terms
- Access: https://sdolivia.github.io/FineGym/
Summary¶
FineGym focuses on Olympic gymnastics with hierarchical annotations capturing routines, sub-actions, and body movements. Its structured labels facilitate fine-grained recognition and temporal parsing, making it a benchmark for high-resolution sports understanding and compositional action modeling.
Reference Paper¶
- Qiuhong Shao et al. "FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding." CVPR, 2020.
PDF
Benchmarks & Baselines¶
- TSM + Hierarchical Parsing - Top-1 Event: 86.2, Action: 79.5; Shao et al., 2020.
- MS-TCN - Temporal localization mAP@0.5: 60.3; applied to FineGym splits.
- Evaluation uses train/val/test splits for floor, vault, uneven bars, and balance beam events.
Tooling & Ecosystem¶
- Official FineGym toolkit for data parsing and annotation alignment.
- TransFG demonstrates transformer-based fine-grained recognition using FineGym.
- MMAction2 integrates FineGym dataloaders and configs.
Known Challenges¶
- Licensing inherits YouTube policies; download links can expire, requiring manual refresh.
- Class distribution is imbalanced across apparatus types and sub-actions.
- Temporal annotations require precise start/end alignment; ensure consistent frame rates.
Cite¶
@inproceedings{shao2020finegym,
title = {FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding},
author = {Shao, Qiuhong and Zhang, Junjie and Wu, Zhizhong and others},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020}
}