Unknown
Massive multilingual dataset of transcribed speech. Covers diverse demographics and accents.
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Whisper base Whisper base on Common Voice 17 Vietnamese test set. | Editorial | 44.07 | 2024 | Source |
| 2 | MMS 1B-L1107 MMS 1B-L1107 on Common Voice 17 Vietnamese test set. | Editorial | 43.88 | 2024 | Source |
| 3 | Whisper large-v2 Whisper large-v2 on Common Voice 17 Vietnamese test set. | Editorial | 18 | 2024 | Source |
| 4 | Whisper large-v3 Whisper large-v3 on Common Voice 17 Vietnamese test set. | Editorial | 13.74 | 2024 | Source |
| 5 | Google USM Chirp v2 Google USM Chirp v2 on Common Voice 17 Vietnamese test set. | Editorial | 12.46 | 2024 | Source |
| 6 | GigaSpeech 2 GigaSpeech 2 + Common Voice + FLEURS on CV 17 Vietnamese test set. | Editorial | 11.47 | 2024 | Source |
| 7 | Azure Speech CLI 1.37 Azure Speech CLI 1.37.0 on Common Voice 17 Vietnamese test set. | Editorial | 10.21 | 2024 | Source |
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Whisper base Whisper base on Common Voice 17 Indonesian test set. | Editorial | 34.7 | 2024 | Source |
| 2 | MMS 1B-L1107 MMS 1B-L1107 on Common Voice 17 Indonesian test set. | Editorial | 20.72 | 2024 | Source |
| 3 | Azure Speech CLI 1.37 Azure Speech CLI 1.37.0 on Common Voice 17 Indonesian test set. | Editorial | 10.33 | 2024 | Source |
| 4 | Google USM Chirp v2 Google USM Chirp v2 on Common Voice 17 Indonesian test set. | Editorial | 9.7 | 2024 | Source |
| 5 | Whisper large-v2 Whisper large-v2 on Common Voice 17 Indonesian test set. | Editorial | 8.93 | 2024 | Source |
| 6 | Whisper large-v3 Whisper large-v3 on Common Voice 17 Indonesian test set. | Editorial | 7.43 | 2024 | Source |
| 7 | GigaSpeech 2 GigaSpeech 2 + Common Voice + FLEURS on CV 17 Indonesian test set. | Editorial | 7.33 | 2024 | Source |
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Whisper base Whisper base on Common Voice 17 Thai test set. | Editorial | 32.59 | 2024 | Source |
| 2 | Google USM Chirp v2 Google USM Chirp v2 on Common Voice 17 Thai test set. | Editorial | 14.75 | 2024 | Source |
| 3 | MMS 1B-L1107 MMS 1B-L1107 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper. | Editorial | 14.49 | 2024 | Source |
| 4 | Azure Speech CLI 1.37 Azure Speech CLI 1.37.0 on Common Voice 17 Thai test set. | Editorial | 10.2 | 2024 | Source |
| 5 | Whisper large-v2 Whisper large-v2 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper. | Editorial | 8.79 | 2024 | Source |
| 6 | Whisper large-v3 Whisper large-v3 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper. | Editorial | 6.02 | 2024 | Source |
| 7 | GigaSpeech 2 GigaSpeech 2 fine-tuned model on Common Voice 17 Thai test set. SOTA. | Editorial | 4.15 | 2024 | Source |
wer
Lower is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Whisper Large v3 WER (%) on Common Voice 15 English test set. Source: B-Whisper paper Table 1 baseline, arxiv:2502.11572 | Community | 8.4 | 2026 | Source |
| 2 | LUPET LUPET attention decoding on Common Voice 13 (10-language average, EN/FR/ES/ZH/IT/RU/PT/TR/NL/TT). SOTA multilingual. arXiv:2401.03689. | Editorial | 9.15 | 2024 | Source |
| 3 | wav2vec 2.0 Large (960h) WER (%) on Common Voice 9 English. Source: Papers With Code / wav2vec2 model card | Community | 10.5 | 2026 | Source |
| 4 | Whisper Large v2 WER (%) on Common Voice English. Source: Whisper paper, arxiv:2212.04356 | Community | 11.2 | 2026 | Source |
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Whisper large-v3 Whisper large-v3 baseline on Common Voice 17.02 English test. From arXiv:2502.11572. | Editorial | 11 | 2025 | Source |
| 2 | Whisper large-v2 Whisper large-v2 on Common Voice 7.0 English test set. Baseline from arXiv:2309.13963. | Editorial | 9.8 | 2023 | Source |
| 3 | Vicuna-13B + Whisper Q-Former Vicuna-13B + Whisper large-v2 + Q-Former on CV 7.0 English test. ~16% relative WER reduction. | Editorial | 8.2 | 2023 | Source |
| 4 | B-Whisper B-Whisper (fine-tuned + contextual biasing prompts) on CV 17.02 English test. 36% relative WER reduction. arXiv:2502.11572. | Editorial | 7 | 2025 | Source |
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Accent-Specific Codebook ASR Accent-Specific Codebook ASR on MCV-Accent-600 English (all accents average WER 6.43%). SOTA. arXiv:2407.03734. | Editorial | 6.43 | 2024 | Source |