📢 Major Announcement: Qwen3‑ASR & Qwen3‑ForcedAligner Open Sourced

January 30, 2026Provided by Utku Ege Tuluk

Alibaba’s Qwen3‑ASR family — a new advanced set of automatic speech recognition (ASR) models — has been officially open sourced, alongside a novel non‑autoregressive forced alignment model, Qwen3‑ForcedAligner. These models are designed as production‑ready, all‑in‑one speech intelligence systems that work across a wide range of languages and real‑world audio conditions. (Qwen)

Here’s what’s notable about this release:

🌍 All‑in‑One Speech Recognition
• The Qwen3‑ASR family consists of multiple models that combine language detection + speech‑to‑text transcription in one system.
• They cover a large set of languages and dialects, supporting robust multilingual transcription. (pandaily.com)

🎙️ Real‑World Performance & Robustness
• Designed to handle messy real‑world audio — such as background noise, various accents, and challenging acoustic environments — while maintaining high accuracy.
• Works well with non‑standard speech types like singing or conversational speech even with background music. (HowAIWorks.ai)

🕐 Flexible & Production‑Ready Tooling
• Includes support for streaming inference, batch processing, and asynchronous use cases suitable for servers and edge deployment.
• The Forced Aligner provides precise word‑level timestamping, useful for subtitles, video editing, and detailed audio analysis. (HowAIWorks.ai)

🔓 Open Source Availability
• Both Qwen3‑ASR and the forced alignment model are released under an open‑source license, enabling developers to download, integrate, and fine‑tune the models freely. (pandaily.com)