Speechautomatic-speech-recognition

Speech Recognition

Automatic speech recognition went from a specialized pipeline (acoustic model + language model + decoder) to a single end-to-end model with OpenAI's Whisper (2022), which was trained on 680K hours of web audio and became the de facto open-source standard overnight. Whisper large-v3 hits under 5% word error rate on LibriSpeech clean, and commercial APIs from Google, AWS, and Deepgram compete fiercely on noisy, accented, and multilingual speech where error rates are 2-3x higher. The real frontier is real-time streaming ASR at conversational latency (<500ms), code-switching between languages mid-sentence, and robust recognition of domain-specific terminology (medical, legal, technical). Assembly AI's Universal-2 and Deepgram's Nova-3 currently lead production benchmarks, but the gap with fine-tuned Whisper variants is narrow.

4
Datasets
72
Results
wer
Canonical metric
Canonical Benchmark

Common Voice

Massive multilingual dataset of transcribed speech. Covers diverse demographics and accents. Over 100 languages, updated continuously by Mozilla Foundation.

Primary metric: wer
View full leaderboard

Top 10

Leading models on Common Voice.

RankModelwer-viYearSource
1
Whisper base
44.12024paper
2
MMS 1B-L1107
43.92024paper
3
Whisper base
34.72024paper
4
Whisper base
32.62024paper
5
MMS 1B-L1107
20.72024paper
6
Whisper large-v2
18.02024paper
7
Google USM Chirp v2
14.82024paper
8
MMS 1B-L1107
14.52024paper
9
Whisper large-v3
13.72024paper
10
Google USM Chirp v2
12.52024paper

What were you looking for on Speech Recognition?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

All datasets

4 datasets tracked for this task.

Related tasks

Other tasks in Speech.

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Speech Recognition? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.