Skip to content

Multi-Dimensional Dataset Taxonomy

Beyond the primary modality-based organization, this page provides alternative ways to discover datasets by task, license type, scale, and application domain.

By Primary Task

Action Recognition (Classify what action is happening)

Dataset Modality Scale
Kinetics-700 RGB video 650k clips / 700 classes
UCF-101 RGB video 13.3k clips / 101 classes
HMDB-51 RGB video 6.8k clips / 51 classes
Moments in Time RGB video 1M clips / 339 classes
HAA500 RGB video 10k clips / 500 classes
NTU RGB+D 60 Skeleton + RGB 57k seq / 60 classes
NTU RGB+D 120 Skeleton + RGB + depth 114k seq / 120 classes
Skeletics-152 Estimated skeleton 150k clips / 152 classes
Toyota Smarthome RGB + depth 16k clips / 31 classes

Temporal Action Detection (Detect when actions happen in untrimmed video)

Dataset Modality Scale
ActivityNet RGB video 20k videos / 200 classes
AVA RGB video 430 clips / 80 atomic actions
PKU-MMD Skeleton + RGB 20k instances / 51 classes
FineGym RGB video 32k segments / hierarchical
Diving48 RGB video 18k clips / 48 classes

Wearable / Sensor-Based HAR

Dataset Sensors Scale
UCI-HAR Smartphone IMU 30 subjects / 6 activities
PAMAP2 IMU + HR 9 subjects / 18 activities
WISDM Phone + watch 51 subjects / 18 activities
OPPORTUNITY 72-channel wearable + ambient 4 subjects
HAPT Smartphone IMU 30 subjects / 12 activities
RealWorld HAR Phone + watch 60 subjects / 15 activities
mHealth Body sensors + ECG 10 subjects / 12 activities
UniMiB-SHAR Smartphone accelerometer 30 subjects / 17 activities
Daphnet Wearable accelerometer 10 subjects / gait freezing
Sussex-Huawei Locomotion Phone + watch 3 users / 2800+ hours

3D Pose Estimation & Motion Capture

Dataset Modality Scale
AMASS SMPL parameters 16k mins / 344 subjects
Human3.6M Mocap + RGB 3.6M frames
BABEL SMPL + text 43 hrs / 3.7k sequences
TotalCapture Mocap + multi-view + IMU 5 subjects

Egocentric / First-Person Vision

Dataset Modality Scale
EPIC-Kitchens-100 Ego RGB + audio 700 hrs / 90 kitchens
Ego4D Ego RGB + stereo + audio 3.3k hrs
Ego-Exo4D Ego + exo RGB 1.4k seq / 20 hrs
HOI4D Ego RGB-D + hand pose 4k+ clips

Motion Generation & Language-Motion

Dataset Modality Scale
HumanML3D SMPL + text 14k+ sequences
InterHuman SMPL-X + text 6k+ interactions
BABEL SMPL + text 43 hrs / 250 action classes
Motion-X Full-body mocap 2M frames

Human-Object Interaction

Dataset Modality Scale
BEHAVE RGB-D + pose 321 seq / 20 subjects
HOI4D Ego RGB-D 4k+ clips
Something-Something V2 RGB video 220k clips

Multi-Person Interaction

Dataset Modality Scale
NTU Mutual Actions RGB + depth + skeleton 26 interaction classes
InterHuman SMPL-X + text 6k+ interactions

Dense Video Captioning / Video-Language

Dataset Modality Scale
ActivityNet Captions RGB + text 20k videos / 100k captions
Charades RGB + scripts 9.8k videos

Sign Language

Dataset Modality Scale
How2Sign RGB + depth + pose 80 hrs / ASL

Clinical / Health

Dataset Application Scale
Daphnet Parkinson's gait freezing 10 subjects
mHealth Mobile health monitoring 10 subjects
FineBio Biology lab procedures Multi-step

By License Type

Fully Open (CC BY, Apache, MIT)

  • Kinetics-700, Something-Something V2, UCI-HAR, PAMAP2, HAPT, RealWorld HAR, mHealth, TotalCapture

Creative Commons Non-Commercial

  • Charades (CC BY-NC), WISDM (CC BY-NC-SA), OPPORTUNITY (CC BY-NC)

Research-Only (Application Required)

  • NTU RGB+D 60/120, Human3.6M, PKU-MMD, EPIC-Kitchens-100, Toyota Smarthome

Non-Commercial Research License

  • Ego4D, BABEL, AMASS, Ego-Exo4D, Motion-X, HumanML3D, InterHuman

Platform Terms / Custom License

  • UCF-101, HMDB-51, ActivityNet, AVA, Moments in Time, BEHAVE

By Scale

Large-Scale (>100k samples)

  • Kinetics-700 (650k), Something-Something V2 (220k), Moments in Time (1M), NTU RGB+D 120 (114k), Skeletics-152 (150k)

Medium-Scale (10k-100k samples)

  • UCF-101 (13.3k), ActivityNet (20k), NTU RGB+D 60 (57k), PKU-MMD (20k), FineGym (32k), Diving48 (18k), Toyota Smarthome (16k), HumanML3D (14k), HAA500 (10k)

Long-Duration Video (>100 hours)

  • Ego4D (3.3k hrs), EPIC-Kitchens-100 (700 hrs), ActivityNet (648 hrs), Sussex-Huawei Locomotion (2800+ hrs)

Small but Focused (<10k samples)

  • HMDB-51 (6.8k), Charades (9.8k), BEHAVE (321 seq), HOI4D (4k), InterHuman (6k), How2Sign (80 hrs)

By Year of Release

Classics (before 2015)

  • UCF-101 (2012), HMDB-51 (2011), Human3.6M (2014), UCI-HAR (2012), PAMAP2 (2012), OPPORTUNITY (2013)

Established Benchmarks (2015-2019)

  • Kinetics-700 (2017→2019), NTU RGB+D 60/120 (2016/2019), Something-Something V2 (2018), ActivityNet (2015), AVA (2018), AMASS (2019), Charades (2016), EPIC-Kitchens-100 (2018→2020), WISDM (2019), HAPT (2015)

Recent (2020-2023)

  • Ego4D (2022), BABEL (2022), Moments in Time (2021), Diving48 (2020), Toyota Smarthome (2020), BEHAVE (2022), Motion-X (2023), Ego-Exo4D (2023), HumanML3D (2022), HOI4D (2022), PKU-MMD (2022), Skeletics-152 (2021)

Frontier (2024+)

  • InterHuman (2024), FineBio (2024), HAA500 (2024)