1000 hours of English speech from audiobooks. Standard benchmark for automatic speech recognition.
17 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.
| # | Model | Org | Submitted | Paper / code | wer-test-clean |
|---|---|---|---|---|---|
| 01 | Universal-1 | AssemblyAI | Apr 2024 | official | 1.60 |
| 02 | Conformer-CTC LargeOSS | NVIDIA / NeMo | Jan 2023 | VALL-E: Neural Codec Language Models are Zero-Shot Text … | 1.70 |
| 03 | Canary-1BOSS | NVIDIA | Oct 2023 | Canary: A Multilingual Speech Recognition Model | 1.70 |
| 04 | Parakeet-CTC-1.1BOSS | NVIDIA / Suno | Nov 2023 | Parakeet: Efficient, Accurate Speech Recognition Adapted… | 1.70 |
| 05 | wav2vec 2.0 Large (960h)OSS | Meta AI | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… | 1.80 |
| 06 | Whisper Large V3OSS | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… | 1.80 |
| 07 | HuBERT Large (LS-960)OSS | Meta AI | Jun 2021 | HuBERT: Self-Supervised Speech Representation Learning b… | 1.90 |
| 08 | Google USM | Mar 2023 | Google USM: Scaling Automatic Speech Recognition Beyond … | 2.00 | |
| 09 | Whisper Large-v2OSS | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… | 2.70 |
| # | Model | Org | Submitted | Paper / code | wer-test-other |
|---|---|---|---|---|---|
| 01 | Universal-1 | AssemblyAI | Apr 2024 | official | 3.10 |
| 02 | wav2vec 2.0 Large (960h)OSS | Meta AI | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… | 3.30 |
| 03 | HuBERT Large (LS-960)OSS | Meta AI | Jun 2021 | HuBERT: Self-Supervised Speech Representation Learning b… | 3.60 |
| 04 | Whisper Large V3OSS | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… | 3.60 |
| 05 | Canary-1BOSS | NVIDIA | Oct 2023 | Canary: A Multilingual Speech Recognition Model | 3.80 |
| 06 | Google USM | Mar 2023 | Google USM: Scaling Automatic Speech Recognition Beyond … | 4.10 | |
| 07 | Parakeet-CTC-1.1BOSS | NVIDIA / Suno | Nov 2023 | Parakeet: Efficient, Accurate Speech Recognition Adapted… | 4.20 |
| 08 | Whisper Large-v2OSS | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… | 5.20 |
Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.