Multi-Dimensional Dataset Taxonomy¶

Beyond the primary modality-based organization, this page provides alternative ways to discover datasets by task, license type, scale, and application domain.

By Primary Task¶

Action Recognition (Classify what action is happening)¶

Dataset	Modality	Scale
Kinetics-700	RGB video	650k clips / 700 classes
UCF-101	RGB video	13.3k clips / 101 classes
HMDB-51	RGB video	6.8k clips / 51 classes
Moments in Time	RGB video	1M clips / 339 classes
HAA500	RGB video	10k clips / 500 classes
NTU RGB+D 60	Skeleton + RGB	57k seq / 60 classes
NTU RGB+D 120	Skeleton + RGB + depth	114k seq / 120 classes
Skeletics-152	Estimated skeleton	150k clips / 152 classes
Toyota Smarthome	RGB + depth	16k clips / 31 classes

Temporal Action Detection (Detect when actions happen in untrimmed video)¶

Dataset	Modality	Scale
ActivityNet	RGB video	20k videos / 200 classes
AVA	RGB video	430 clips / 80 atomic actions
PKU-MMD	Skeleton + RGB	20k instances / 51 classes
FineGym	RGB video	32k segments / hierarchical
Diving48	RGB video	18k clips / 48 classes

Wearable / Sensor-Based HAR¶

Dataset	Sensors	Scale
UCI-HAR	Smartphone IMU	30 subjects / 6 activities
PAMAP2	IMU + HR	9 subjects / 18 activities
WISDM	Phone + watch	51 subjects / 18 activities
OPPORTUNITY	72-channel wearable + ambient	4 subjects
HAPT	Smartphone IMU	30 subjects / 12 activities
RealWorld HAR	Phone + watch	60 subjects / 15 activities
mHealth	Body sensors + ECG	10 subjects / 12 activities
UniMiB-SHAR	Smartphone accelerometer	30 subjects / 17 activities
Daphnet	Wearable accelerometer	10 subjects / gait freezing
Sussex-Huawei Locomotion	Phone + watch	3 users / 2800+ hours

3D Pose Estimation & Motion Capture¶

Dataset	Modality	Scale
AMASS	SMPL parameters	16k mins / 344 subjects
Human3.6M	Mocap + RGB	3.6M frames
BABEL	SMPL + text	43 hrs / 3.7k sequences
TotalCapture	Mocap + multi-view + IMU	5 subjects

Egocentric / First-Person Vision¶

Dataset	Modality	Scale
EPIC-Kitchens-100	Ego RGB + audio	700 hrs / 90 kitchens
Ego4D	Ego RGB + stereo + audio	3.3k hrs
Ego-Exo4D	Ego + exo RGB	1.4k seq / 20 hrs
HOI4D	Ego RGB-D + hand pose	4k+ clips

Motion Generation & Language-Motion¶

Dataset	Modality	Scale
HumanML3D	SMPL + text	14k+ sequences
InterHuman	SMPL-X + text	6k+ interactions
BABEL	SMPL + text	43 hrs / 250 action classes
Motion-X	Full-body mocap	2M frames

Human-Object Interaction¶

Dataset	Modality	Scale
BEHAVE	RGB-D + pose	321 seq / 20 subjects
HOI4D	Ego RGB-D	4k+ clips
Something-Something V2	RGB video	220k clips

Multi-Person Interaction¶

Dataset	Modality	Scale
NTU Mutual Actions	RGB + depth + skeleton	26 interaction classes
InterHuman	SMPL-X + text	6k+ interactions

Dense Video Captioning / Video-Language¶

Dataset	Modality	Scale
ActivityNet Captions	RGB + text	20k videos / 100k captions
Charades	RGB + scripts	9.8k videos

Sign Language¶

Dataset	Modality	Scale
How2Sign	RGB + depth + pose	80 hrs / ASL

Clinical / Health¶

Dataset	Application	Scale
Daphnet	Parkinson's gait freezing	10 subjects
mHealth	Mobile health monitoring	10 subjects
FineBio	Biology lab procedures	Multi-step

By License Type¶

Fully Open (CC BY, Apache, MIT)¶

Kinetics-700, Something-Something V2, UCI-HAR, PAMAP2, HAPT, RealWorld HAR, mHealth, TotalCapture

Creative Commons Non-Commercial¶

Charades (CC BY-NC), WISDM (CC BY-NC-SA), OPPORTUNITY (CC BY-NC)

Research-Only (Application Required)¶

NTU RGB+D 60/120, Human3.6M, PKU-MMD, EPIC-Kitchens-100, Toyota Smarthome

Non-Commercial Research License¶

Ego4D, BABEL, AMASS, Ego-Exo4D, Motion-X, HumanML3D, InterHuman

Platform Terms / Custom License¶

UCF-101, HMDB-51, ActivityNet, AVA, Moments in Time, BEHAVE

By Scale¶

Large-Scale (>100k samples)¶

Kinetics-700 (650k), Something-Something V2 (220k), Moments in Time (1M), NTU RGB+D 120 (114k), Skeletics-152 (150k)

Medium-Scale (10k-100k samples)¶

UCF-101 (13.3k), ActivityNet (20k), NTU RGB+D 60 (57k), PKU-MMD (20k), FineGym (32k), Diving48 (18k), Toyota Smarthome (16k), HumanML3D (14k), HAA500 (10k)

Long-Duration Video (>100 hours)¶

Ego4D (3.3k hrs), EPIC-Kitchens-100 (700 hrs), ActivityNet (648 hrs), Sussex-Huawei Locomotion (2800+ hrs)

Small but Focused (<10k samples)¶

HMDB-51 (6.8k), Charades (9.8k), BEHAVE (321 seq), HOI4D (4k), InterHuman (6k), How2Sign (80 hrs)

By Year of Release¶

Classics (before 2015)¶

UCF-101 (2012), HMDB-51 (2011), Human3.6M (2014), UCI-HAR (2012), PAMAP2 (2012), OPPORTUNITY (2013)

Established Benchmarks (2015-2019)¶

Kinetics-700 (2017→2019), NTU RGB+D 60/120 (2016/2019), Something-Something V2 (2018), ActivityNet (2015), AVA (2018), AMASS (2019), Charades (2016), EPIC-Kitchens-100 (2018→2020), WISDM (2019), HAPT (2015)

Recent (2020-2023)¶

Ego4D (2022), BABEL (2022), Moments in Time (2021), Diving48 (2020), Toyota Smarthome (2020), BEHAVE (2022), Motion-X (2023), Ego-Exo4D (2023), HumanML3D (2022), HOI4D (2022), PKU-MMD (2022), Skeletics-152 (2021)

Frontier (2024+)¶

InterHuman (2024), FineBio (2024), HAA500 (2024)