Codesota · Speech · STT leaderboardSOTA speech-to-text · OSS ASR · API modelsUpdated May 7, 2026
§ 00 · Direct answer

STT leaderboard: the speech-to-text SOTA.

The short answer: Parakeet RNNT 1.1B is the current SOTA entry in this STT leaderboard, with 1.8% WER on LibriSpeech test-clean. For hosted production streaming, start with Deepgram Nova-3. For an open-source ASR model, start with Parakeet RNNT 1.1B or Whisper Large v3 Turbo if multilingual coverage matters most.

This page targets STT leaderboard, STT SOTA, SOTA speech-to-text and SOTA OSS ASR model queries. It reuses the shared CodeSOTA speech model catalogue rather than a separate hand-maintained table.

§ 01 · Leaderboard

ASR models, ranked by WER.

Lower WER is better. Rows are ranked by the shared catalogue score, usually LibriSpeech test-clean. Featured entries include Parakeet RNNT, Deepgram Nova-3, Whisper v3 Turbo and Voxtral.

Benchmark context · LibriSpeech →
#ModelVendorKindArchitectureParamsDatasetWER
01Parakeet RNNT 1.1BFeaturedNVIDIAOpen SourceConformer + RNNT1.1BLibriSpeech test-clean1.8%
02Conformer XLGoogleResearchConformer + LAS600MLibriSpeech test-clean2.0%
03Deepgram Nova-3FeaturedDeepgramCloud APIProprietary TransformerLibriSpeech test-clean2.2%
04Voxtral LargeFeaturedMistral AICloud APIAudio-Language Model (Transformer)LibriSpeech test-clean2.3%
05AssemblyAI Universal-2AssemblyAICloud APIConformer-basedLibriSpeech test-clean2.4%
06Canary 1BNVIDIAOpen SourceFastConformer + multi-task1BLibriSpeech test-clean2.4%
07Whisper Large v3 TurboFeaturedOpenAIOpen SourceTransformer Encoder-Decoder809MLibriSpeech test-clean2.5%
08Gladia v2GladiaCloud APIWhisper-based + customLibriSpeech test-clean2.5%
09Google Chirp 3GoogleCloud APIGenerative (USM-based)LibriSpeech test-clean2.5%
10Speechmatics FlowSpeechmaticsCloud APIProprietaryLibriSpeech test-clean2.6%
11Whisper Large v3OpenAIOpen SourceTransformer Encoder-Decoder1.55BLibriSpeech test-clean2.7%
12Groq WhisperGroqCloud APIWhisper on LPU1.55BLibriSpeech test-clean2.7%
13Google USMGoogleCloud APIUniversal Speech Model2BLibriSpeech test-clean2.8%
14Voxtral MiniMistral AICloud APIAudio-Language Model (Transformer)LibriSpeech test-clean2.8%
15Gemini 3 Pro (audio)GoogleCloud APIMultimodal LLMAA-WER v2.02.9%
16Azure SpeechMicrosoftCloud APIProprietaryLibriSpeech test-clean3.0%
17Moonshine BaseUseful SensorsOpen SourceOptimized Transformer61MLibriSpeech test-clean3.5%
18wav2vec 2.0MetaOpen SourceSelf-supervised Transformer317MLibriSpeech test-clean3.8%
§ 02 · Decision

Open-source ASR or API?

The SOTA speech-to-text choice is not only the lowest WER. It is the trade between accuracy, latency, privacy, language coverage and operations.

Choose open-source when

You need control and unit economics.

Run Parakeet RNNT for English accuracy, Whisper v3 Turbo for multilingual coverage, Canary for ASR plus translation, or Moonshine for edge devices. This is usually the right path for private audio, high volume and custom deployment.

Choose an API when

You need product features now.

Deepgram Nova-3, AssemblyAI, Gladia, Speechmatics, Groq Whisper and Voxtral are better fits when streaming, diarization, hosted scaling, translation or audio-LLM behavior matters more than self-hosting the ASR stack.

§ 03 · Next reads

Related speech pages.

Speech hubSpeech-to-textSpeech recognition guideLibriSpeech benchmark