Alibabaâs Qwen3âASR family â a new advanced set of automatic speech recognition (ASR) models â has been officially open sourced, alongside a novel nonâautoregressive forced alignment model, Qwen3âForcedAligner. These models are designed as productionâready, allâinâone speech intelligence systems that work across a wide range of languages and realâworld audio conditions. (Qwen)
Hereâs whatâs notable about this release:
đ AllâinâOne Speech Recognition
⢠The Qwen3âASR family consists of multiple models that combine language detection + speechâtoâtext transcription in one system.
⢠They cover a large set of languages and dialects, supporting robust multilingual transcription. (pandaily.com)
đď¸ RealâWorld Performance & Robustness
⢠Designed to handle messy realâworld audio â such as background noise, various accents, and challenging acoustic environments â while maintaining high accuracy.
⢠Works well with nonâstandard speech types like singing or conversational speech even with background music. (HowAIWorks.ai)
đ Flexible & ProductionâReady Tooling
⢠Includes support for streaming inference, batch processing, and asynchronous use cases suitable for servers and edge deployment.
⢠The Forced Aligner provides precise wordâlevel timestamping, useful for subtitles, video editing, and detailed audio analysis. (HowAIWorks.ai)
đ Open Source Availability
⢠Both Qwen3âASR and the forced alignment model are released under an openâsource license, enabling developers to download, integrate, and fineâtune the models freely. (pandaily.com)