Codesota · Models1,357 models indexed · 104 match filter
Editorial · Models
Every model, measured.
Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.
Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Speech models
104 models in Speech · page 2 of 3.
| # | Model | Vendor | Parameters | Architecture | SOTA | Benchmarks | Results |
|---|---|---|---|---|---|---|---|
| 051 | Whisper-tiny.en | — | — | — | 8 | 9 | |
| 052 | Niagara-38m-batch.en | — | — | — | 8 | 8 | |
| 053 | Qwen3-ASR-1.7B | Alibaba | 1.7B | Transformer (Qwen3 backbone) | 7 | 8 | |
| 054 | Cohere Transcribe (Mar 2026) | Cohere | 2B | Transformer ASR | 6 | 7 | |
| 055 | LongCat-Flash-Omni | — | — | — | 7 | 7 | |
| 056 | Canary-Qwen-2.5B | NVIDIA | 2.5B | FastConformer encoder + Qwen2 LM decoder | 6 | 6 | |
| 057 | Owsm_ctc_v3.1_1B | — | — | — | 5 | 6 | |
| 058 | Parakeet-tdt-0.6b-v2 | — | — | — | 5 | 6 | |
| 059 | Moonshine-streaming-small | — | — | — | 4 | 5 | |
| 060 | Niagara-19m-batch.en | — | — | — | 5 | 5 | |
| 061 | Granite Speech 3.3 8B | IBM | 8B | Transformer | 4 | 4 | |
| 062 | Canary-1B | NVIDIA | 1B | FastConformer encoder + Transformer decoder | 1 | 3 | |
| 063 | Moonshine Streaming Medium | Useful Sensors | 245M | Causal encoder-decoder | 2 | 3 | |
| 064 | Canary-1B-Flash | NVIDIA | 1B | FastConformer + TDT decoder | 2 | 2 | |
| 065 | Distil-large-v3.5 | — | — | — | 2 | 2 | |
| 066 | Google USM | 2B | Conformer encoder + RNN-T/CTC | 1 | 2 | ||
| 067 | Granite 4.0 1B Speech | IBM | 1B | Transformer | 2 | 2 | |
| 068 | HuBERT Large (LS-960) | Meta AI | 317M | CNN + Transformer (BERT-style) | 1 | 2 | |
| 069 | Lite-whisper-large-v3-acc | — | — | — | 2 | 2 | |
| 070 | Llama 3 Speech (70B) | — | — | — | 2 | 2 | |
| 071 | Parakeet-CTC-1.1B | NVIDIA / Suno | 1.1B | FastConformer-CTC | 1 | 2 | |
| 072 | Parakeet-tdt-0.6b-v3 | — | — | — | 2 | 2 | |
| 073 | Pulse STT | Smallest AI | — | Proprietary streaming STT | 1 | 2 | |
| 074 | Qwen3-ASR-0.6B | Alibaba | 0.6B | Transformer (Qwen3 backbone) | 2 | 2 | |
| 075 | Voxtral-Mini-3B-2507 | — | — | — | 2 | 2 | |
| 076 | Voxtral-Small-24B-2507 | Mistral AI | 24B | Large multimodal LM with audio encoder | 2 | 2 | |
| 077 | Canary-180M-Flash | NVIDIA | 180M | FastConformer-Small + TDT | 1 | 1 | |
| 078 | Canary-1b-v2 | — | — | — | 1 | 1 | |
| 079 | Conformer-CTC Large | NVIDIA / NeMo | 118M | Conformer (Conv + Attention) + CTC | 1 | 1 | |
| 080 | CrisperWhisper | nyrahealth | 1.5B | Whisper fine-tune with alignment | 1 | 1 | |
| 081 | Distil-Whisper Large v2 | — | — | — | 1 | 1 | |
| 082 | Distil-Whisper Large v3 | — | — | — | 1 | 1 | |
| 083 | Distil-Whisper Large v3.5 | — | — | — | 1 | 1 | |
| 084 | Distil-Whisper Medium (English) | — | — | — | 1 | 1 | |
| 085 | Distil-Whisper Small (English) | — | — | — | 1 | 1 | |
| 086 | ECAPA-TDNN | Ghent University | ~14.7M | ECAPA-TDNN (SE-Res2Net + attentive stats pooling) | 1 | 1 | |
| 087 | Fairseq S2T (MuST-C) | Meta AI | ~150M | Conformer encoder + transformer decoder | 1 | 1 | |
| 088 | GLM-ASR-Nano-2512 | Zhipu AI | 2B | GLM4 + audio encoder | 1 | 1 | |
| 089 | Lite-whisper-large-v3 | — | — | — | 1 | 1 | |
| 090 | Moshi ASR | — | — | — | 1 | 1 | |
| 091 | Owsm_ctc_v4_1B | — | — | — | 1 | 1 | |
| 092 | Parakeet-TDT-1.1B | NVIDIA | 1.1B | FastConformer (TDT) | 1 | 1 | |
| 093 | Parakeet-ctc-1.1b | — | — | — | 1 | 1 | |
| 094 | Parakeet-rnnt-1.1b | — | — | — | 1 | 1 | |
| 095 | ResNet-34 (AM-Softmax, VoxCeleb2) | Community | ~6M | ResNet-34 with AM-Softmax loss | 1 | 1 | |
| 096 | SYMPHONY | — | — | — | 1 | 1 | |
| 097 | Stt-2.6b-en | — | — | — | 1 | 1 | |
| 098 | Wav2Vec 2.0 Base | — | — | — | 1 | 1 | |
| 099 | Wav2Vec 2.0 Large (LS-960) | — | — | — | 1 | 1 | |
| 100 | Whisper Medium (English) | — | — | — | 1 | 1 |