Cohere Transcribe: 2B Open-Source ASR Model Takes #1 on Leaderboard

March 30, 2026Provided by Utku Ege Tuluk

On March 26, 2026, Cohere released Cohere Transcribe — a 2-billion-parameter open-source automatic speech recognition (ASR) model that claims the #1 spot on the Hugging Face Open ASR Leaderboard. Licensed under Apache 2.0 and designed to run on consumer-grade GPUs, Transcribe marks Cohere’s first entry into voice AI and signals growing competition in the open-source speech recognition space.

Intermediate

Cohere Transcribe launch banner — Image credit: Cohere

Architecture and Design

Cohere Transcribe uses a Conformer-based encoder-decoder architecture with an asymmetric design: more than 90% of its 2B parameters are dedicated to a Fast-Conformer encoder for acoustic representation, paired with a lightweight Transformer decoder for token generation. This approach minimizes autoregressive inference compute while maintaining transcription accuracy.

Unlike competitors such as Qwen3-ASR-1.7B and IBM Granite 4.0 1B Speech — which build on pre-trained text LLMs — Cohere Transcribe uses a dedicated architecture optimized specifically for speech-to-text inference speed and serving cost. The model was trained on 500,000 hours of curated audio-transcript pairs using standard supervised cross-entropy loss, with synthetic data augmentation and non-speech background noise (SNR: 0–30 dB) to improve robustness.

The model supports 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese (Mandarin), Japanese, Korean, Vietnamese, and Arabic.

Benchmark Performance

On the Hugging Face Open ASR Leaderboard, Cohere Transcribe achieves an average word error rate (WER) of 5.42%, outperforming all other models:

Model	Avg WER	Parameters
Cohere Transcribe	5.42%	2B
Zoom Scribe v1	5.47%	—
IBM Granite 4.0 1B Speech	5.52%	1B
NVIDIA Canary Qwen 2.5B	5.63%	2.5B
Qwen3-ASR-1.7B	5.76%	1.7B
ElevenLabs Scribe v2	5.83%	—
OpenAI Whisper Large v3	7.44%	1.6B

The model also delivers up to 3x higher offline throughput than similarly-sized competitors, placing it on the Pareto frontier of the speed-accuracy tradeoff.

Throughput vs. accuracy scatter plot showing Cohere Transcribe on the Pareto frontier — Image credit: Cohere

Human Evaluation and Multilingual Results

In pairwise human evaluation on English transcription, Cohere Transcribe achieved a 61% average win rate across criteria including meaning preservation, hallucination prevention, named entity recognition, and formatting. The strongest preference margins were against OpenAI Whisper Large v3 (64%) and IBM Granite (78%).

Human preference evaluation chart showing Cohere Transcribe win rates against competitors — Image credit: Cohere

Multilingually, the model ranks 4th overall and 2nd among open-source models on the multilingual ASR leaderboard, with particularly strong results in Japanese (70% preference) and Italian (60% preference).

Image credit: Cohere Labs on Hugging Face

Per-language error rate comparison across FLEURS, Common Voice, MLS, and Wenet benchmarks — Image credit: Cohere Labs on Hugging Face

Availability and Deployment

Cohere Transcribe is available through multiple channels:

Open-source download on Hugging Face under Apache 2.0
Free API access (rate-limited) via the Cohere dashboard
Model Vault — Cohere’s dedicated managed inference for production without rate limits
vLLM integration with optimized batching for up to 2x throughput improvement

Cohere also plans to integrate Transcribe into its enterprise agent orchestration platform, North, expanding from pure transcription into broader speech intelligence capabilities.

Limitations

The model has some notable constraints: it does not support automatic language detection (a language code must be specified), lacks speaker diarization and timestamp output, and can hallucinate from non-speech sounds — Cohere recommends using voice activity detection (VAD) preprocessing for noisy audio.

Related Coverage

Voxtral Transcribe 2: Mistral’s Open Real-Time Speech-to-Text — Mistral’s competing open-source ASR platform, released February 2026
Qwen3-ASR & Qwen3-ForcedAligner Open Sourced — Alibaba’s production-ready ASR models, released January 2026

Sources

Cohere Transcribe: a new state-of-the-art in speech recognition — Official Cohere blog
Introducing Cohere-transcribe: state-of-the-art speech recognition — Hugging Face blog
CohereLabs/cohere-transcribe-03-2026 — Model card on Hugging Face
Cohere launches an open source voice model specifically for transcription — TechCrunch