Home/Browse/Speech/Speech Recognition/Common Voice

Common Voice

Unknown

Massive multilingual dataset of transcribed speech. Covers diverse demographics and accents.

Benchmark Stats

Models14
Papers30
Metrics6

SOTA History

Not enough data to show trend.

wer-vi

Higher is better

RankModelSourceScoreYearPaper
1Whisper base

Whisper base on Common Voice 17 Vietnamese test set.

Editorial44.072024Source
2MMS 1B-L1107

MMS 1B-L1107 on Common Voice 17 Vietnamese test set.

Editorial43.882024Source
3Whisper large-v2

Whisper large-v2 on Common Voice 17 Vietnamese test set.

Editorial182024Source
4Whisper large-v3

Whisper large-v3 on Common Voice 17 Vietnamese test set.

Editorial13.742024Source
5Google USM Chirp v2

Google USM Chirp v2 on Common Voice 17 Vietnamese test set.

Editorial12.462024Source
6GigaSpeech 2

GigaSpeech 2 + Common Voice + FLEURS on CV 17 Vietnamese test set.

Editorial11.472024Source
7Azure Speech CLI 1.37

Azure Speech CLI 1.37.0 on Common Voice 17 Vietnamese test set.

Editorial10.212024Source

wer-id

Higher is better

RankModelSourceScoreYearPaper
1Whisper base

Whisper base on Common Voice 17 Indonesian test set.

Editorial34.72024Source
2MMS 1B-L1107

MMS 1B-L1107 on Common Voice 17 Indonesian test set.

Editorial20.722024Source
3Azure Speech CLI 1.37

Azure Speech CLI 1.37.0 on Common Voice 17 Indonesian test set.

Editorial10.332024Source
4Google USM Chirp v2

Google USM Chirp v2 on Common Voice 17 Indonesian test set.

Editorial9.72024Source
5Whisper large-v2

Whisper large-v2 on Common Voice 17 Indonesian test set.

Editorial8.932024Source
6Whisper large-v3

Whisper large-v3 on Common Voice 17 Indonesian test set.

Editorial7.432024Source
7GigaSpeech 2

GigaSpeech 2 + Common Voice + FLEURS on CV 17 Indonesian test set.

Editorial7.332024Source

wer-th

Higher is better

RankModelSourceScoreYearPaper
1Whisper base

Whisper base on Common Voice 17 Thai test set.

Editorial32.592024Source
2Google USM Chirp v2

Google USM Chirp v2 on Common Voice 17 Thai test set.

Editorial14.752024Source
3MMS 1B-L1107

MMS 1B-L1107 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.

Editorial14.492024Source
4Azure Speech CLI 1.37

Azure Speech CLI 1.37.0 on Common Voice 17 Thai test set.

Editorial10.22024Source
5Whisper large-v2

Whisper large-v2 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.

Editorial8.792024Source
6Whisper large-v3

Whisper large-v3 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.

Editorial6.022024Source
7GigaSpeech 2

GigaSpeech 2 fine-tuned model on Common Voice 17 Thai test set. SOTA.

Editorial4.152024Source

wer

wer

Lower is better

RankModelSourceScoreYearPaper
1Whisper Large v3

WER (%) on Common Voice 15 English test set. Source: B-Whisper paper Table 1 baseline, arxiv:2502.11572

Community8.42026Source
2LUPET

LUPET attention decoding on Common Voice 13 (10-language average, EN/FR/ES/ZH/IT/RU/PT/TR/NL/TT). SOTA multilingual. arXiv:2401.03689.

Editorial9.152024Source
3wav2vec 2.0 Large (960h)

WER (%) on Common Voice 9 English. Source: Papers With Code / wav2vec2 model card

Community10.52026Source
4Whisper Large v2

WER (%) on Common Voice English. Source: Whisper paper, arxiv:2212.04356

Community11.22026Source

wer-en

Higher is better

RankModelSourceScoreYearPaper
1Whisper large-v3

Whisper large-v3 baseline on Common Voice 17.02 English test. From arXiv:2502.11572.

Editorial112025Source
2Whisper large-v2

Whisper large-v2 on Common Voice 7.0 English test set. Baseline from arXiv:2309.13963.

Editorial9.82023Source
3Vicuna-13B + Whisper Q-Former

Vicuna-13B + Whisper large-v2 + Q-Former on CV 7.0 English test. ~16% relative WER reduction.

Editorial8.22023Source
4B-Whisper

B-Whisper (fine-tuned + contextual biasing prompts) on CV 17.02 English test. 36% relative WER reduction. arXiv:2502.11572.

Editorial72025Source

wer-en-accents

Higher is better

RankModelSourceScoreYearPaper
1Accent-Specific Codebook ASR

Accent-Specific Codebook ASR on MCV-Accent-600 English (all accents average WER 6.43%). SOTA. arXiv:2407.03734.

Editorial6.432024Source

Submit a Result

Common Voice Leaderboard | CodeSOTA | CodeSOTA