| 01 | Qwen3.5-Omni-Plus | — | Apr 2026 | Qwen3.5-Omni Technical Report | 1.11 |
| 02 | Granite Speech 4.1 2BOpen | IBM | May 2025 | Granite-speech: open-source speech-aware LLMs with stron… | 1.33 |
| 03 | Audio Flamingo 3 | — | Jul 2025 | Audio Flamingo 3: Advancing Audio Intelligence with Full… · code | 1.57 |
| 04 | LongCat-Flash-Omni | — | Oct 2025 | LongCat-Flash-Omni Technical Report · code | 1.57 |
| 05 | Parakeet-rnnt-0.6b | — | May 2023 | Fast Conformer with Linearly Scalable Attention for Effi… | 1.62 |
| 06 | Qwen3-ASR-1.7BOpen | Alibaba | Jan 2026 | Qwen3-ASR Technical Report · code | 1.63 |
| 07 | Stt-2.6b-en | — | Sep 2024 | Moshi: a speech-text foundation model for real-time dial… · code | 1.70 |
| 08 | CrisperWhisperOpen | nyrahealth | Aug 2024 | CrisperWhisper: Accurate Timestamps on Verbatim Speech T… · code | 1.82 |
| 09 | Voxtral-Mini-3B-2507 | — | Jul 2025 | Voxtral | 1.88 |
| 10 | SYMPHONY-ASR | — | Jan 2026 | pwc-dump | 1.91 |
| 11 | Wav2Vec 2.0 Large (LS-960) | — | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… · code | 2.00 |
| 12 | Wav2Vec 2.0 Base | — | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… · code | 2.10 |
| 13 | GLM-ASR-Nano-2512Open | Zhipu AI | Dec 2025 | pwc-dump · code | 2.15 |
| 14 | VibeVoice-ASR-HF | — | Jan 2026 | VIBEVOICE-ASR Technical Report | 2.20 |
| 15 | Distil-Whisper Large v3.5 | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 2.37 |
| 16 | Cohere Transcribe (Mar 2026)Open | Cohere | Mar 2026 | pwc-dump | 2.37 |
| 17 | Parakeet-rnnt-1.1b | — | May 2023 | Fast Conformer with Linearly Scalable Attention for Effi… | 2.50 |
| 18 | Distil-Whisper Large v3 | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 2.54 |
| 19 | Parakeet-TDT-1.1BOpen | NVIDIA | Apr 2023 | Efficient Sequence Transduction by Jointly Predicting To… · code | 2.60 |
| 20 | Granite 4.0 1B SpeechOpen | IBM | May 2025 | Granite-speech: open-source speech-aware LLMs with stron… | 2.85 |
| 21 | Granite Speech 3.3 8BOpen | IBM | May 2025 | Granite-speech: open-source speech-aware LLMs with stron… | 2.86 |
| 22 | Canary-1B-FlashOpen | NVIDIA | Mar 2025 | Training and Inference Efficiency of Encoder-Decoder Spe… | 2.87 |
| 23 | Canary-1BOpen | NVIDIA | Feb 2024 | pwc-dump | 2.93 |
| 24 | Distil-Whisper Large v2 | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 2.94 |
| 25 | Whisper Medium (English) | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 3.02 |
| 26 | Whisper-small.en | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 3.05 |
| 27 | Whisper Small (English) | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 3.05 |
| 28 | Llama 3 Speech (70B) | — | Jul 2024 | The Llama 3 Herd of Models · code | 3.10 |
| 29 | Llama 3 (405B, Instruct) | Meta | Jul 2024 | The Llama 3 Herd of Models · code | 3.10 |
| 30 | Canary-Qwen-2.5BOpen | NVIDIA | Mar 2025 | Training and Inference Efficiency of Encoder-Decoder Spe… | 3.10 |
| 31 | Parakeet-tdt-0.6b-v2 | — | Apr 2023 | Efficient Sequence Transduction by Jointly Predicting To… · code | 3.19 |
| 32 | Granite Speech 3.3 2BOpen | IBM | May 2025 | Granite-speech: open-source speech-aware LLMs with stron… | 3.26 |
| 33 | Voxtral-Small-24B-2507Open | Mistral AI | Jul 2025 | Voxtral | 3.26 |
| 34 | Moonshine-base | — | Oct 2024 | Moonshine: Speech Recognition for Live Transcription and… · code | 3.38 |
| 35 | Distil-Whisper Small (English) | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 3.48 |
| 36 | Parakeet-ctc-1.1b | — | May 2023 | Fast Conformer with Linearly Scalable Attention for Effi… | 3.51 |
| 37 | Canary-1b-v2 | — | Aug 2025 | pwc-dump | 3.56 |
| 38 | Parakeet-tdt-0.6b-v3 | — | Apr 2023 | Efficient Sequence Transduction by Jointly Predicting To… · code | 3.59 |
| 39 | Distil-Whisper Medium (English) | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 3.69 |
| 40 | Parakeet-ctc-0.6b | — | May 2023 | Fast Conformer with Linearly Scalable Attention for Effi… | 3.80 |
| 41 | Phi-4 Multimodal InstructOpen | Microsoft | Mar 2025 | Phi-4-Mini Technical Report: Compact yet Powerful Multim… | 3.82 |
| 42 | Asr-wav2vec2-librispeech | — | Jun 2021 | SpeechBrain: A General-Purpose Speech Toolkit · code | 3.83 |
| 43 | Lite-whisper-large-v3-acc | — | Feb 2025 | LiteASR: Efficient Automatic Speech Recognition with Low… · code | 3.91 |
| 44 | Whisper Large v3Open | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 3.91 |
| 45 | Stt_en_fastconformer_transducer_large | — | May 2023 | Fast Conformer with Linearly Scalable Attention for Effi… | 3.97 |
| 46 | Stt_en_fastconformer_ctc_large | — | May 2023 | Fast Conformer with Linearly Scalable Attention for Effi… | 4.04 |
| 47 | Stt_en_conformer_ctc_large | — | May 2020 | Conformer: Convolution-augmented Transformer for Speech … · code | 4.15 |
| 48 | Asr-conformer-loquacious | — | Feb 2025 | pwc-dump | 4.24 |
| 49 | Whisper Large v3 TurboOpen | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 4.24 |
| 50 | Whisper baseOpen | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 4.25 |
| 51 | Canary-180M-FlashOpen | NVIDIA | Mar 2025 | Training and Inference Efficiency of Encoder-Decoder Spe… | 4.35 |
| 52 | Lite-whisper-large-v3 | — | Feb 2025 | LiteASR: Efficient Automatic Speech Recognition with Low… · code | 4.40 |
| 53 | Qwen3-ASR-0.6BOpen | Alibaba | Jan 2026 | Qwen3-ASR Technical Report · code | 4.45 |
| 54 | SYMPHONY | — | Oct 2025 | pwc-dump | 4.48 |
| 55 | Moonshine-streaming-tiny | — | Jan 2026 | pwc-dump | 4.50 |
| 56 | Moonshine-tiny | — | Oct 2024 | Moonshine: Speech Recognition for Live Transcription and… · code | 4.55 |
| 57 | Lite-whisper-large-v3-turbo-acc | — | Feb 2025 | LiteASR: Efficient Automatic Speech Recognition with Low… · code | 4.60 |
| 58 | Owsm_ctc_v4_1B | — | May 2025 | OWSM v4: Improving Open Whisper-Style Speech Models via … · code | 4.89 |
| 59 | Moonshine Streaming MediumOpen | Useful Sensors | Jan 2026 | pwc-dump | 5.00 |
| 60 | Distil-large-v3.5 | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 5.04 |
| 61 | Zipformer-transducer-XL-290M | — | Oct 2023 | Zipformer: A faster and better encoder for automatic spe… · code | 5.04 |
| 62 | Whisper Large v2Open | OpenAI | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 5.14 |
| 63 | Owsm_ctc_v3.1_1B | — | Jan 2024 | OWSM v3.1: Better and Faster Open Whisper-Style Speech M… · code | 5.15 |
| 64 | Distil-large-v3 | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 5.19 |
| 65 | Lite-whisper-large-v3-fast | — | Feb 2025 | LiteASR: Efficient Automatic Speech Recognition with Low… · code | 5.19 |
| 66 | Parakeet-tdt_ctc-110m | — | Apr 2023 | Efficient Sequence Transduction by Jointly Predicting To… · code | 5.22 |
| 67 | Voxtral-Mini-4B-Realtime-2602Open | Mistral AI | Feb 2026 | Voxtral Realtime | 5.52 |
| 68 | Whisper Large | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 5.54 |
| 69 | Whisper Tiny (English) | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 5.66 |
| 70 | Moshi ASR | — | Sep 2024 | Moshi: a speech-text foundation model for real-time dial… · code | 5.70 |
| 71 | Whisper-medium.en | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 5.85 |
| 72 | Moonshine-streaming-small | — | Jan 2026 | pwc-dump | 6.78 |
| 73 | Distil-large-v2 | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 6.84 |
| 74 | Distil-small.en | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 7.73 |
| 75 | Stt_en_conformer_ctc_small | — | May 2020 | Conformer: Convolution-augmented Transformer for Speech … · code | 7.92 |
| 76 | Distil-medium.en | — | Nov 2023 | Distil-Whisper: Robust Knowledge Distillation via Large-… · code | 8.35 |
| 77 | Niagara-38m-batch.en | — | Feb 2026 | pwc-dump | 9.35 |
| 78 | Whisper-base.en | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 10.35 |
| 79 | Niagara-19m-batch.en | — | Feb 2026 | pwc-dump | 11.20 |
| 80 | Hubert-xlarge-ls960-ft | — | Jun 2021 | HuBERT: Self-Supervised Speech Representation Learning b… · code | 12.22 |
| 81 | Wav2vec2-large-960h-lv60-self | — | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… · code | 12.42 |
| 82 | Wav2vec2-conformer-rel-pos-large-960h-ft | — | Oct 2020 | fairseq S2T: Fast Speech-to-Text Modeling with fairseq · code | 12.44 |
| 83 | Wav2vec2-base-960h | — | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… · code | 12.53 |
| 84 | Wav2vec2-conformer-rope-large-960h-ft | — | Oct 2020 | fairseq S2T: Fast Speech-to-Text Modeling with fairseq · code | 12.54 |
| 85 | Mms-1b-all | — | May 2023 | Scaling Speech Technology to 1,000+ Languages · code | 12.63 |
| 86 | Hubert-large-ls960-ft | — | Jun 2021 | HuBERT: Self-Supervised Speech Representation Learning b… · code | 12.75 |
| 87 | Data2vec-audio-large-960h | — | Feb 2022 | data2vec: A General Framework for Self-supervised Learni… · code | 12.94 |
| 88 | Wav2vec2-large-robust-ft-libri-960h | — | Apr 2021 | Robust wav2vec 2.0: Analyzing Domain Shift in Self-Super… · code | 13.76 |
| 89 | Whisper-tiny.en | — | Dec 2022 | Robust Speech Recognition via Large-Scale Weak Supervisi… · code | 15.45 |
| 90 | wav2vec 2.0 Large (960h)Open | Meta AI | Jun 2020 | wav2vec 2.0: A Framework for Self-Supervised Learning of… · code | 15.46 |
| 91 | Data2vec-audio-base-960h | — | Feb 2022 | data2vec: A General Framework for Self-supervised Learni… · code | 15.48 |
| 92 | Mms-1b-fl102 | — | May 2023 | Scaling Speech Technology to 1,000+ Languages · code | 28.70 |