NTU RGB+D 120¶
- Modality: RGB video, depth maps, 3D skeleton sequences (Microsoft Kinect V2)
- Primary Tasks: 3D action recognition, skeleton-based recognition, cross-view transfer
- Scale: 114,480 trimmed sequences, 120 action classes, 106 subjects, 32 setups
- License: Research-only; requires signing usage agreement with NTU RGB+D team
- Access: http://rose1.ntu.edu.sg/datasets/actionrecognition.asp
Summary¶
NTU RGB+D 120 expands the seminal NTU-60 dataset with new action classes, subjects, and capture setups. Each recording provides synchronized RGB, depth, infrared, and skeleton data, supporting both vision-based and graph convolution approaches to 3D action recognition.
Reference Paper¶
- J. Liu et al. "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding." IEEE TPAMI, 2020.
PDF
Benchmarks & Baselines¶
- Two-Stream Adaptive GCN - Cross-Subject Top-1: 86.3%, Cross-Setup Top-1: 88.9%; Liu et al., CVPR 2019.
- Shift-GCN - Cross-Subject Top-1: 85.9%, Cross-Setup Top-1: 87.6%; Cheng et al., CVPR 2020.
- Official evaluation splits: Cross-Subject and Cross-Setup. Recent works often report Cross-View as well.
Tooling & Ecosystem¶
- OpenMMLab's MMAction2 includes dataloaders and benchmarking configs.
- PyTorch Geometric Temporal has skeleton-based examples.
- ntu-download-scripts host utilities for downloading and parsing skeleton data.
Known Challenges¶
- Access requires manual approval and file hosting uses split
.ziparchives; downloads can be slow. - Skeleton annotations use Kinect joint order; conversions to SMPL or CMU indexes require mapping scripts.
- Class imbalance exists for infrequent "interaction" actions; consider reweighting.