Codesota · Benchmark · Common VoiceHome/Leaderboards/Audio & Speech/Automatic Speech Recognition/Common Voice
Unknown

Common Voice.

Massive multilingual dataset of transcribed speech. Covers diverse demographics and accents.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Wer Vi

Wer Vi is the reported evaluation metric for Common Voice. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Wer Viverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Whisper base
Whisper base on Common Voice 17 Vietnamese test set.
verified44.072024Source ↗Looks wrong?
02MMS 1B-L1107
MMS 1B-L1107 on Common Voice 17 Vietnamese test set.
verified43.882024Source ↗Looks wrong?
03Whisper large-v2
Whisper large-v2 on Common Voice 17 Vietnamese test set.
verified182024Source ↗Looks wrong?
04Whisper large-v3
Whisper large-v3 on Common Voice 17 Vietnamese test set.
verified13.742024Source ↗Looks wrong?
05Google USM Chirp v2
Google USM Chirp v2 on Common Voice 17 Vietnamese test set.
verified12.462024Source ↗Looks wrong?
06GigaSpeech 2
GigaSpeech 2 + Common Voice + FLEURS on CV 17 Vietnamese test set.
verified11.472024Source ↗Looks wrong?
07Azure Speech CLI 1.37
Azure Speech CLI 1.37.0 on Common Voice 17 Vietnamese test set.
verified10.212024Source ↗Looks wrong?

Wer Id

Wer Id is the reported evaluation metric for Common Voice. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Wer Idverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Whisper base
Whisper base on Common Voice 17 Indonesian test set.
verified34.72024Source ↗Looks wrong?
02MMS 1B-L1107
MMS 1B-L1107 on Common Voice 17 Indonesian test set.
verified20.722024Source ↗Looks wrong?
03Azure Speech CLI 1.37
Azure Speech CLI 1.37.0 on Common Voice 17 Indonesian test set.
verified10.332024Source ↗Looks wrong?
04Google USM Chirp v2
Google USM Chirp v2 on Common Voice 17 Indonesian test set.
verified9.702024Source ↗Looks wrong?
05Whisper large-v2
Whisper large-v2 on Common Voice 17 Indonesian test set.
verified8.932024Source ↗Looks wrong?
06Whisper large-v3
Whisper large-v3 on Common Voice 17 Indonesian test set.
verified7.432024Source ↗Looks wrong?
07GigaSpeech 2
GigaSpeech 2 + Common Voice + FLEURS on CV 17 Indonesian test set.
verified7.332024Source ↗Looks wrong?

Wer Th

Wer Th is the reported evaluation metric for Common Voice. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Wer Thverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Whisper base
Whisper base on Common Voice 17 Thai test set.
verified32.592024Source ↗Looks wrong?
02Google USM Chirp v2
Google USM Chirp v2 on Common Voice 17 Thai test set.
verified14.752024Source ↗Looks wrong?
03MMS 1B-L1107
MMS 1B-L1107 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.
verified14.492024Source ↗Looks wrong?
04Azure Speech CLI 1.37
Azure Speech CLI 1.37.0 on Common Voice 17 Thai test set.
verified10.22024Source ↗Looks wrong?
05Whisper large-v2
Whisper large-v2 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.
verified8.792024Source ↗Looks wrong?
06Whisper large-v3
Whisper large-v3 on Common Voice 17 Thai test set. Baseline from GigaSpeech 2 paper.
verified6.022024Source ↗Looks wrong?
07GigaSpeech 2
GigaSpeech 2 fine-tuned model on Common Voice 17 Thai test set. SOTA.
verified4.152024Source ↗Looks wrong?

wer

Wer is the reported evaluation metric for Common Voice. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for werverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Whisper Large v3
WER (%) on Common Voice 15 English test set. Source: B-Whisper paper Table 1 baseline, arxiv:2502.11572
verified8.402026Source ↗Looks wrong?
02LUPET
LUPET attention decoding on Common Voice 13 (10-language average, EN/FR/ES/ZH/IT/RU/PT/TR/NL/TT). SOTA multilingual. arXiv:2401.03689.
verified9.152024Source ↗Looks wrong?
03wav2vec 2.0 Large (960h)
WER (%) on Common Voice 9 English. Source: wav2vec2 model card
verified10.52020Paper ↗Looks wrong?
04Whisper Large v2
WER (%) on Common Voice English. Source: Whisper paper, arxiv:2212.04356
verified11.22026Source ↗Looks wrong?

Wer En

Wer En is the reported evaluation metric for Common Voice. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Wer Enverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Whisper large-v3
Whisper large-v3 baseline on Common Voice 17.02 English test. From arXiv:2502.11572.
verified112025Source ↗Looks wrong?
02Whisper large-v2
Whisper large-v2 on Common Voice 7.0 English test set. Baseline from arXiv:2309.13963.
verified9.802023Source ↗Looks wrong?
03Vicuna-13B + Whisper Q-Former
Vicuna-13B + Whisper large-v2 + Q-Former on CV 7.0 English test. ~16% relative WER reduction.
verified8.202023Source ↗Looks wrong?
04B-Whisper
B-Whisper (fine-tuned + contextual biasing prompts) on CV 17.02 English test. 36% relative WER reduction. arXiv:2502.11572.
verified7.002025Source ↗Looks wrong?

Wer En Accents

Wer En Accents is the reported evaluation metric for Common Voice. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Wer En Accentsverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Accent-Specific Codebook ASR
Accent-Specific Codebook ASR on MCV-Accent-600 English (all accents average WER 6.43%). SOTA. arXiv:2407.03734.
verified6.432024Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Automatic Speech Recognition
Common Voice Leaderboard | CodeSOTA | CodeSOTA