KittenTTS: State-of-the-Art Voice Synthesis in Under 25 MB

KittenTTS, released on February 19, 2026 by KittenML, is an open-source text-to-speech model that fits in under 25 MB — making it one of the smallest high-quality TTS systems available today. With just 14–15 million parameters in its smallest configuration, it runs entirely on CPUs without any GPU requirement, opening the door for real-time voice synthesis on edge devices, browsers, IoT hardware, and mobile applications.

KittenTTS ultra-lightweight text-to-speech AI model with audio waveform visualization
Illustration generated by AI

Three Model Sizes, One Goal: Efficiency

KittenTTS ships in three variants designed for different resource constraints:

  • Nano — 14–15M parameters, ~25 MB (INT8 quantized). The flagship ultra-compact model for devices with minimal storage and compute budgets.
  • Micro — 40M parameters, ~41 MB. A balance between voice quality and efficiency, suitable for mid-range embedded systems.
  • Mini — 80M parameters, ~80 MB. The highest-quality variant, targeting applications where storage is less constrained but GPU access is still unavailable.

All variants output 24 kHz audio in WAV format and include eight expressive voice options: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo (four female, four male). The project is released under the Apache 2.0 license, and the codebase requires Python 3.12.

How It Works

At its core, KittenTTS pairs a lightweight transformer encoder with a neural vocoder. Both components were trained with quantization-aware training (QAT) — meaning quantization error is introduced during the training process itself, rather than applied afterward. This approach allows the model weights to adapt to lower-precision representations without the quality degradation typically associated with post-training quantization.

The result is a model family that can be exported to ONNX format for cross-platform deployment, runs locally without cloud API calls (preserving user privacy), and avoids network latency entirely. Version 0.8 represents a significant step forward from the original release, incorporating a 10× expanded training dataset and improved optimization pipelines that the team says deliver enhanced quality, expressivity, and realism.

Quick-start usage is straightforward:

pip install kitten-tts
kitten-tts --model nano --voice Bella --text "Hello from the edge!" --out hello.wav

Targeting the Edge-First Era

KittenTTS competes in an increasingly crowded space of compact TTS models — including KaniTTS2, Qwen3-TTS.cpp, and FreeFlow — but distinguishes itself through extreme miniaturization. The <25 MB Nano model is small enough to ship inside a mobile app, run on a Raspberry Pi, or embed in browser-side JavaScript via ONNX.js.

The primary use cases highlighted by the project span:

  • Smart home voice assistants running fully offline
  • Game NPC dialogue generated in real time on consumer hardware
  • Accessibility and screen-reader tools that function without internet access
  • Industrial IoT alert systems and robotics voice interaction
  • Wearable devices with tight memory and power budgets

The project is currently in developer preview (v0.8), and the team notes that some users have encountered minor issues with the Nano INT8 variant. English is the only supported language for now, though multilingual support is on the roadmap for future releases.

Related Coverage

Sources