Common Voice

Name: Common Voice Benchmark Results
Creator: Unknown
License: https://creativecommons.org/licenses/by/4.0/

Unknown

Massive multilingual dataset of transcribed speech. Covers diverse demographics and accents.

Models14

Papers30

Metrics6

SOTA History

Not enough data to show trend.

Higher is better

Rank	Model	Source	Score	Year	Paper
1	Whisper base Whisper base on Common Voice 17 Vietnamese test set.	Editorial	44.07	2024	Source
2	MMS 1B-L1107 MMS 1B-L1107 on Common Voice 17 Vietnamese test set.	Editorial	43.88	2024	Source
3	Whisper large-v2 Whisper large-v2 on Common Voice 17 Vietnamese test set.	Editorial	18	2024	Source
4	Whisper large-v3 Whisper large-v3 on Common Voice 17 Vietnamese test set.	Editorial	13.74	2024	Source
5	Google USM Chirp v2 Google USM Chirp v2 on Common Voice 17 Vietnamese test set.	Editorial	12.46	2024	Source
6	GigaSpeech 2 GigaSpeech 2 + Common Voice + FLEURS on CV 17 Vietnamese test set.	Editorial	11.47	2024	Source
7	Azure Speech CLI 1.37 Azure Speech CLI 1.37.0 on Common Voice 17 Vietnamese test set.	Editorial	10.21	2024	Source

Higher is better

Rank	Model	Source	Score	Year	Paper
1	Whisper base Whisper base on Common Voice 17 Indonesian test set.	Editorial	34.7	2024	Source
2	MMS 1B-L1107 MMS 1B-L1107 on Common Voice 17 Indonesian test set.	Editorial	20.72	2024	Source
3	Azure Speech CLI 1.37 Azure Speech CLI 1.37.0 on Common Voice 17 Indonesian test set.	Editorial	10.33	2024	Source
4	Google USM Chirp v2 Google USM Chirp v2 on Common Voice 17 Indonesian test set.	Editorial	9.7	2024	Source
5	Whisper large-v2 Whisper large-v2 on Common Voice 17 Indonesian test set.	Editorial	8.93	2024	Source
6	Whisper large-v3 Whisper large-v3 on Common Voice 17 Indonesian test set.	Editorial	7.43	2024	Source
7	GigaSpeech 2 GigaSpeech 2 + Common Voice + FLEURS on CV 17 Indonesian test set.	Editorial	7.33	2024	Source

Higher is better

Rank	Model	Source	Score	Year	Paper
1	Whisper base Whisper base on Common Voice 17 Thai test set.	Editorial	32.59	2024	Source
2	Google USM Chirp v2 Google USM Chirp v2 on Common Voice 17 Thai test set.	Editorial	14.75	2024	Source
3	MMS 1B-L1107 MMS 1B-L1107 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.	Editorial	14.49	2024	Source
4	Azure Speech CLI 1.37 Azure Speech CLI 1.37.0 on Common Voice 17 Thai test set.	Editorial	10.2	2024	Source
5	Whisper large-v2 Whisper large-v2 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.	Editorial	8.79	2024	Source
6	Whisper large-v3 Whisper large-v3 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.	Editorial	6.02	2024	Source
7	GigaSpeech 2 GigaSpeech 2 fine-tuned model on Common Voice 17 Thai test set. SOTA.	Editorial	4.15	2024	Source

Lower is better

Rank	Model	Source	Score	Year	Paper
1	Whisper Large v3 WER (%) on Common Voice 15 English test set. Source: B-Whisper paper Table 1 baseline, arxiv:2502.11572	Community	8.4	2026	Source
2	LUPET LUPET attention decoding on Common Voice 13 (10-language average, EN/FR/ES/ZH/IT/RU/PT/TR/NL/TT). SOTA multilingual. arXiv:2401.03689.	Editorial	9.15	2024	Source
3	wav2vec 2.0 Large (960h) WER (%) on Common Voice 9 English. Source: Papers With Code / wav2vec2 model card	Community	10.5	2026	Source
4	Whisper Large v2 WER (%) on Common Voice English. Source: Whisper paper, arxiv:2212.04356	Community	11.2	2026	Source

Higher is better

Rank	Model	Source	Score	Year	Paper
1	Whisper large-v3 Whisper large-v3 baseline on Common Voice 17.02 English test. From arXiv:2502.11572.	Editorial	11	2025	Source
2	Whisper large-v2 Whisper large-v2 on Common Voice 7.0 English test set. Baseline from arXiv:2309.13963.	Editorial	9.8	2023	Source
3	Vicuna-13B + Whisper Q-Former Vicuna-13B + Whisper large-v2 + Q-Former on CV 7.0 English test. ~16% relative WER reduction.	Editorial	8.2	2023	Source
4	B-Whisper B-Whisper (fine-tuned + contextual biasing prompts) on CV 17.02 English test. 36% relative WER reduction. arXiv:2502.11572.	Editorial	7	2025	Source

Higher is better

Rank	Model	Source	Score	Year	Paper
1	Accent-Specific Codebook ASR Accent-Specific Codebook ASR on MCV-Accent-600 English (all accents average WER 6.43%). SOTA. arXiv:2407.03734.	Editorial	6.43	2024	Source