LibriSpeech

Johns Hopkins University

1000 hours of English speech from audiobooks. Standard benchmark for automatic speech recognition with clean and noisy test splits.

Benchmark Stats

Models15
Papers28
Metrics2

SOTA History

WER (test-other)

Word Error Rate on noisier/accented speech (lower is better)

Lower is better

RankModelSourceScoreYearPaper
1Parakeet RNNT 1.1B

NVIDIA + Suno.ai. 1.1B params. Greedy decoding, no LM. SOTA on test-other.

Editorial2.472025Source
2Parakeet TDT 0.6B v2

NVIDIA. 0.6B params. FastConformer-TDT.

Editorial3.192025Source
3wav2vec 2.0 Large (960h)

test-other WER (%). wav2vec 2.0 Large, 960h. Source: Table 3, arxiv:2006.11477

Community3.32026Source
4Canary 1B v2

NVIDIA. 1B multilingual ASR+AST. Aug 2025.

Editorial3.562025Source
5Parakeet TDT 0.6B v3

NVIDIA. 0.6B params. Multilingual. Sep 2025.

Editorial3.592025Source
6Whisper Large v3

test-other WER (%). Whisper large-v3. Source: OpenAI model card / arxiv:2212.04356

Editorial3.62024Source
7HuBERT Large (LS-960)

test-other WER (%). HuBERT Large, 960h. Source: Table 2, arxiv:2106.07447

Community3.62026Source
8Canary-1B

test-other WER (%). Canary-1B EN. Source: Table 2, arxiv:2310.09873

Community3.82026Source
9Voxtral Mini 3B

Mistral AI. 3B multimodal model. July 2025.

Editorial4.082025Source
10Google USM

test-other WER (%). Google USM 2B. Source: Table 3, arxiv:2303.01037

Community4.12026Source
11Parakeet-CTC-1.1B

test-other WER (%). Parakeet-CTC-1.1B. Source: Table 1, arxiv:2311.13251

Community4.22026Source
12Whisper Large v2

test-other WER (%). Whisper large-v2. Source: Table 5, arxiv:2212.04356

Community5.22026Source
13Phi-4-multimodal-instruct

Microsoft. 5.6B multimodal model. Feb 2025.

Editorial5.972025Source

WER (test-clean)

Word Error Rate on clean speech (lower is better)

Lower is better

RankModelSourceScoreYearPaper
1Parakeet RNNT 1.1B

NVIDIA + Suno.ai. 1.1B params. FastConformer-RNNT. Greedy decoding, no LM. SOTA English ASR.

Editorial1.462025Source
2Phi-4-multimodal-instruct

Microsoft. 5.6B multimodal model. #1 on HF OpenASR leaderboard (March 2025, 6.14% avg WER).

Editorial1.672025Source
3Parakeet TDT 0.6B v2

NVIDIA. 0.6B params. FastConformer-TDT. Greedy decoding on HF Open-ASR-Leaderboard framework.

Editorial1.692025Source
4Parakeet-CTC-1.1B

test-clean WER (%). Parakeet-CTC-1.1B. Source: Table 1, arxiv:2311.13251

Community1.72026Source
5Canary-1B

test-clean WER (%). Canary-1B EN. Source: Table 2, arxiv:2310.09873

Community1.72026Source
6Conformer-CTC Large

test-clean WER (%). Conformer-CTC Large, NeMo. Source: NVIDIA NGC model card

Community1.72026Source
7Whisper Large v3

test-clean WER (%). Whisper large-v3. Source: OpenAI model card / arxiv:2212.04356

Editorial1.82024Source
8wav2vec 2.0 Large (960h)

test-clean WER (%). wav2vec 2.0 Large, fine-tuned on 960h. Source: Table 3, arxiv:2006.11477

Community1.82026Source
9Voxtral Mini 3B

Mistral AI. 3B multimodal model. Based on Ministral 3B with audio encoder. July 2025.

Editorial1.892025Source
10HuBERT Large (LS-960)

test-clean WER (%). HuBERT Large fine-tuned on 960h. Source: Table 2, arxiv:2106.07447

Community1.92026Source
11Parakeet TDT 0.6B v3

NVIDIA. 0.6B params. Multilingual ASR/AST. FastConformer-TDT. RTFx 3332 (fastest throughput). Sep 2025.

Editorial1.932025Source
12Google USM

test-clean WER (%). Google USM 2B. Source: Table 3, arxiv:2303.01037

Community22026Source
13Canary 1B v2

NVIDIA. 1B multilingual ASR+AST. Supports EN/DE/FR/ES. Aug 2025.

Editorial2.182025Source
14Whisper Large v2

test-clean WER (%). Whisper large-v2. Source: Table 5, arxiv:2212.04356

Community2.72026Source
15wav2vec 2.0 Large

Meta. 317M params. Self-supervised pre-training on 60k hours of speech. Foundational SSL ASR model.

Editorial2.92024Source

Submit a Result