Multi-Dimensional Dataset Taxonomy
Beyond the primary modality-based organization, this page provides alternative ways to discover datasets by task, license type, scale, and application domain.
By Primary Task
Action Recognition (Classify what action is happening)
Temporal Action Detection (Detect when actions happen in untrimmed video)
| Dataset |
Modality |
Scale |
| ActivityNet |
RGB video |
20k videos / 200 classes |
| AVA |
RGB video |
430 clips / 80 atomic actions |
| PKU-MMD |
Skeleton + RGB |
20k instances / 51 classes |
| FineGym |
RGB video |
32k segments / hierarchical |
| Diving48 |
RGB video |
18k clips / 48 classes |
Wearable / Sensor-Based HAR
| Dataset |
Sensors |
Scale |
| UCI-HAR |
Smartphone IMU |
30 subjects / 6 activities |
| PAMAP2 |
IMU + HR |
9 subjects / 18 activities |
| WISDM |
Phone + watch |
51 subjects / 18 activities |
| OPPORTUNITY |
72-channel wearable + ambient |
4 subjects |
| HAPT |
Smartphone IMU |
30 subjects / 12 activities |
| RealWorld HAR |
Phone + watch |
60 subjects / 15 activities |
| mHealth |
Body sensors + ECG |
10 subjects / 12 activities |
| UniMiB-SHAR |
Smartphone accelerometer |
30 subjects / 17 activities |
| Daphnet |
Wearable accelerometer |
10 subjects / gait freezing |
| Sussex-Huawei Locomotion |
Phone + watch |
3 users / 2800+ hours |
3D Pose Estimation & Motion Capture
| Dataset |
Modality |
Scale |
| AMASS |
SMPL parameters |
16k mins / 344 subjects |
| Human3.6M |
Mocap + RGB |
3.6M frames |
| BABEL |
SMPL + text |
43 hrs / 3.7k sequences |
| TotalCapture |
Mocap + multi-view + IMU |
5 subjects |
Egocentric / First-Person Vision
| Dataset |
Modality |
Scale |
| EPIC-Kitchens-100 |
Ego RGB + audio |
700 hrs / 90 kitchens |
| Ego4D |
Ego RGB + stereo + audio |
3.3k hrs |
| Ego-Exo4D |
Ego + exo RGB |
1.4k seq / 20 hrs |
| HOI4D |
Ego RGB-D + hand pose |
4k+ clips |
Motion Generation & Language-Motion
| Dataset |
Modality |
Scale |
| HumanML3D |
SMPL + text |
14k+ sequences |
| InterHuman |
SMPL-X + text |
6k+ interactions |
| BABEL |
SMPL + text |
43 hrs / 250 action classes |
| Motion-X |
Full-body mocap |
2M frames |
Human-Object Interaction
Multi-Person Interaction
Dense Video Captioning / Video-Language
Sign Language
| Dataset |
Modality |
Scale |
| How2Sign |
RGB + depth + pose |
80 hrs / ASL |
Clinical / Health
| Dataset |
Application |
Scale |
| Daphnet |
Parkinson's gait freezing |
10 subjects |
| mHealth |
Mobile health monitoring |
10 subjects |
| FineBio |
Biology lab procedures |
Multi-step |
By License Type
Fully Open (CC BY, Apache, MIT)
- Kinetics-700, Something-Something V2, UCI-HAR, PAMAP2, HAPT, RealWorld HAR, mHealth, TotalCapture
Creative Commons Non-Commercial
- Charades (CC BY-NC), WISDM (CC BY-NC-SA), OPPORTUNITY (CC BY-NC)
Research-Only (Application Required)
- NTU RGB+D 60/120, Human3.6M, PKU-MMD, EPIC-Kitchens-100, Toyota Smarthome
Non-Commercial Research License
- Ego4D, BABEL, AMASS, Ego-Exo4D, Motion-X, HumanML3D, InterHuman
- UCF-101, HMDB-51, ActivityNet, AVA, Moments in Time, BEHAVE
By Scale
Large-Scale (>100k samples)
- Kinetics-700 (650k), Something-Something V2 (220k), Moments in Time (1M), NTU RGB+D 120 (114k), Skeletics-152 (150k)
Medium-Scale (10k-100k samples)
- UCF-101 (13.3k), ActivityNet (20k), NTU RGB+D 60 (57k), PKU-MMD (20k), FineGym (32k), Diving48 (18k), Toyota Smarthome (16k), HumanML3D (14k), HAA500 (10k)
Long-Duration Video (>100 hours)
- Ego4D (3.3k hrs), EPIC-Kitchens-100 (700 hrs), ActivityNet (648 hrs), Sussex-Huawei Locomotion (2800+ hrs)
Small but Focused (<10k samples)
- HMDB-51 (6.8k), Charades (9.8k), BEHAVE (321 seq), HOI4D (4k), InterHuman (6k), How2Sign (80 hrs)
By Year of Release
Classics (before 2015)
- UCF-101 (2012), HMDB-51 (2011), Human3.6M (2014), UCI-HAR (2012), PAMAP2 (2012), OPPORTUNITY (2013)
Established Benchmarks (2015-2019)
- Kinetics-700 (2017→2019), NTU RGB+D 60/120 (2016/2019), Something-Something V2 (2018), ActivityNet (2015), AVA (2018), AMASS (2019), Charades (2016), EPIC-Kitchens-100 (2018→2020), WISDM (2019), HAPT (2015)
Recent (2020-2023)
- Ego4D (2022), BABEL (2022), Moments in Time (2021), Diving48 (2020), Toyota Smarthome (2020), BEHAVE (2022), Motion-X (2023), Ego-Exo4D (2023), HumanML3D (2022), HOI4D (2022), PKU-MMD (2022), Skeletics-152 (2021)
Frontier (2024+)
- InterHuman (2024), FineBio (2024), HAA500 (2024)