BABEL¶
- Modality: Motion capture sequences with SMPL parameters, natural language labels, action segmentation
- Primary Tasks: Motion-language alignment, action segmentation, motion synthesis, retrieval
- Scale: 43 hours of motion, 3.7k sequences, 250 action classes, dense textual annotations
- License: BABEL research license (non-commercial, attribution)
- Access: https://babel.is.tue.mpg.de/
Summary¶
BABEL enriches AMASS motion data with temporally localized action labels and free-form text descriptions, enabling cross-modal learning between motion and language. It supports research in sequence segmentation, motion captioning, and retrieval that bridges mocap with semantic understanding.
Reference Paper¶
- Pavlakos et al. "BABEL: Bodies, Actions and Behavior with English Labels." CVPR, 2022.
PDF
Benchmarks & Baselines¶
- Transformer Segmentation - mIOU: 77.1 for action segmentation; Pavlakos et al., 2022.
- Motion-to-Text Retrieval - R@1: 31.6; Pavlakos et al., 2022.
- Evaluation protocols include seen/unseen subject splits and long vs. short sequence breakdowns.
Tooling & Ecosystem¶
- Official BABEL toolkit for downloading metadata and aligning with AMASS sequences.
- Text2Poser demonstrates motion-language modeling using BABEL.
- Integrates smoothly with SMPL-X and downstream generative models.
Known Challenges¶
- Requires prior access to AMASS data; ensure both licenses are met.
- Text annotations contain free-form phrasing; preprocessing (lemmatization, synonyms) improves alignment.
- Some sequences have overlapping labels; segmentation tasks must handle multi-label intervals.