BABEL¶

Modality: Motion capture sequences with SMPL parameters, natural language labels, action segmentation
Primary Tasks: Motion-language alignment, action segmentation, motion synthesis, retrieval
Scale: 43 hours of motion, 3.7k sequences, 250 action classes, dense textual annotations
License: BABEL research license (non-commercial, attribution)
Access: https://babel.is.tue.mpg.de/

Summary¶

BABEL enriches AMASS motion data with temporally localized action labels and free-form text descriptions, enabling cross-modal learning between motion and language. It supports research in sequence segmentation, motion captioning, and retrieval that bridges mocap with semantic understanding.

Reference Paper¶

Pavlakos et al. "BABEL: Bodies, Actions and Behavior with English Labels." CVPR, 2022. PDF

Benchmarks & Baselines¶

Transformer Segmentation - mIOU: 77.1 for action segmentation; Pavlakos et al., 2022.
Motion-to-Text Retrieval - R@1: 31.6; Pavlakos et al., 2022.
Evaluation protocols include seen/unseen subject splits and long vs. short sequence breakdowns.

Tooling & Ecosystem¶

Official BABEL toolkit for downloading metadata and aligning with AMASS sequences.
Text2Poser demonstrates motion-language modeling using BABEL.
Integrates smoothly with SMPL-X and downstream generative models.

Known Challenges¶

Requires prior access to AMASS data; ensure both licenses are met.
Text annotations contain free-form phrasing; preprocessing (lemmatization, synonyms) improves alignment.
Some sequences have overlapping labels; segmentation tasks must handle multi-label intervals.

Cite¶

@inproceedings{pavlakos2022babel,
  title     = {BABEL: Bodies, Actions and Behaviors with English Labels},
  author    = {Pavlakos, Georgios and Choutas, Vasileios and Bolkart, Timo and others},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2022}
}