NVIDIA Open-Sources SONIC: A Foundation Model for Humanoid Whole-Body Control

NVIDIA has open-sourced SONIC (Supersizing Motion Tracking for Natural Humanoid Whole-Body Control), a 42-million-parameter foundation model that enables humanoid robots to perform natural, full-body movements learned from over 100 million frames of human motion-capture data. Released on February 20, 2026 as part of the GR00T Whole-Body Control platform, SONIC represents a major step toward scalable, general-purpose humanoid robot control.

Humanoid robot performing dynamic whole-body motion with holographic trajectory visualization in a research laboratory
Illustration generated by AI

Why SONIC Matters

Despite the rapid scaling of foundation models in language and vision, humanoid robot controllers have remained small and narrow — typically a few million parameters, trained for limited behaviors on a handful of GPUs. SONIC breaks this pattern by scaling along three axes simultaneously: model capacity (from 1.2M to 42M parameters), training data (100M+ frames spanning 700 hours of motion capture), and compute (9,000 GPU hours across 128 GPUs over 3 days).

The core insight is using motion tracking as a universal, scalable training objective rather than hand-engineering task-specific reward functions — a longstanding bottleneck in reinforcement learning for robotics. This lets a single unified policy handle diverse behaviors including walking, running, crawling, jumping, boxing, kneeling, and complex manipulation tasks.

Architecture and Capabilities

SONIC uses a universal encoder-decoder architecture that converts diverse motion commands into a shared latent representation. This enables multiple control interfaces through a single model:

  • VR Teleoperation — Full-body control via PICO headsets and trackers, enabling operators to directly puppet a humanoid in real time
  • Video-to-Motion — Human motion estimation from a monocular webcam at 60+ FPS, allowing video-based teleoperation without special hardware
  • Text and Music Commands — Zero-shot execution from natural language prompts (e.g., “walk stealthily,” “crawl on elbows”) and music-synchronized dancing
  • Gamepad and Keyboard — Interactive locomotion with style control for rapid prototyping

A real-time kinematic planner generates future motion trajectories in under 5 milliseconds on standard laptop hardware, bridging the gap between high-level commands and low-level joint control.

Real-World Performance

Validated on the Unitree G1 humanoid robot, SONIC achieved a 100% success rate across 50 diverse real-world motion trajectories — including jumps and complex loco-manipulation — operating in a zero-shot manner without any real-world fine-tuning. When paired with NVIDIA’s GR00T N1.5 vision-language-action model, the system reached a 95% success rate on mobile pick-and-place tasks.

The team positions SONIC as the “System 1” fast reactive controller for humanoid robots — handling instinctive, whole-body motor skills — designed to complement slower “System 2” reasoning and planning systems.

Open-Source Availability

The full SONIC release includes model weights (in ONNX format), a C++ inference stack for real-time hardware deployment (including Jetson), VR teleoperation code, and a kinematic planner. The source code is licensed under Apache 2.0, with model weights under the NVIDIA Open Model License (which permits commercial use with attribution). Training scripts and the full-scale motion dataset are planned for future release.

Models are available on Hugging Face, and the code is hosted in the GR00T-WholeBodyControl GitHub repository.

Related Coverage

Sources