UCI-HAR¶

Modality: Smartphone IMU (accelerometer + gyroscope)
Primary Tasks: Human activity recognition from inertial sensors
Scale: 30 subjects, 6 activity classes, 10,299 samples (2.56-second windows at 50 Hz)
License: Public domain (UCI Machine Learning Repository)
Access: https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones

Summary¶

UCI-HAR is one of the most widely used benchmarks for smartphone-based human activity recognition. The dataset was collected from 30 volunteers (aged 19-48) wearing a Samsung Galaxy S II on the waist. Each subject performed six activities: walking, walking upstairs, walking downstairs, sitting, standing, and lying down. Tri-axial accelerometer and gyroscope signals were captured at 50 Hz, preprocessed with noise filters, and segmented into 2.56-second fixed-width sliding windows with 50% overlap. The dataset provides both raw sensor data and a 561-feature vector of time and frequency domain variables, making it accessible for both deep learning and classical ML approaches.

Reference Paper¶

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge L. Reyes-Ortiz. "A Public Domain Dataset for Human Activity Recognition Using Smartphones." ESANN, 2013. PDF

Benchmarks & Baselines¶

SVM (561 features) - Accuracy: 96.0% — Anguita et al., ESANN 2013.
DeepConvLSTM - Accuracy: ~95.8% — Ordonez & Roggen, Sensors 2016.
1D-CNN - Accuracy: ~96.4% — commonly reported in deep learning HAR literature.
Standard evaluation uses the predefined 70/30 train/test split (21 train subjects, 9 test subjects).

Tooling & Ecosystem¶

Available directly from the UCI ML Repository.
Widely available in tutorial form for scikit-learn, TensorFlow, and PyTorch.
Pre-extracted 561 features enable immediate use without signal processing.
TensorFlow Datasets community contributions include UCI-HAR.

Known Challenges¶

Only 6 activity classes; the task is considered largely solved for this label set.
Single sensor placement (waist) limits generalizability to other body positions.
Lab-controlled conditions do not reflect real-world deployment variability.
Static activities (sitting vs. standing) are the primary source of confusion.
The 561 handcrafted features may not generalize; raw signal approaches are preferred for modern work.

Cite¶

@inproceedings{anguita2013public,
  title     = {A Public Domain Dataset for Human Activity Recognition Using Smartphones},
  author    = {Anguita, Davide and Ghio, Alessandro and Oneto, Luca and Parra, Xavier and Reyes-Ortiz, Jorge L.},
  booktitle = {European Symposium on Artificial Neural Networks (ESANN)},
  year      = {2013}
}