Codesota · Models1,357 models indexed · 104 match filter
Editorial · Models

Every model, measured.

Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.

Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Speech models

104 models in Speech · page 2 of 3.

#ModelVendorParametersArchitectureSOTABenchmarksResults
051Whisper-tiny.en89
052Niagara-38m-batch.en88
053Qwen3-ASR-1.7BAlibaba1.7BTransformer (Qwen3 backbone)78
054Cohere Transcribe (Mar 2026)Cohere2BTransformer ASR67
055LongCat-Flash-Omni77
056Canary-Qwen-2.5BNVIDIA2.5BFastConformer encoder + Qwen2 LM decoder66
057Owsm_ctc_v3.1_1B56
058Parakeet-tdt-0.6b-v256
059Moonshine-streaming-small45
060Niagara-19m-batch.en55
061Granite Speech 3.3 8BIBM8BTransformer44
062Canary-1BNVIDIA1BFastConformer encoder + Transformer decoder13
063Moonshine Streaming MediumUseful Sensors245MCausal encoder-decoder23
064Canary-1B-FlashNVIDIA1BFastConformer + TDT decoder22
065Distil-large-v3.522
066Google USMGoogle2BConformer encoder + RNN-T/CTC12
067Granite 4.0 1B SpeechIBM1BTransformer22
068HuBERT Large (LS-960)Meta AI317MCNN + Transformer (BERT-style)12
069Lite-whisper-large-v3-acc22
070Llama 3 Speech (70B)22
071Parakeet-CTC-1.1BNVIDIA / Suno1.1BFastConformer-CTC12
072Parakeet-tdt-0.6b-v322
073Pulse STTSmallest AIProprietary streaming STT12
074Qwen3-ASR-0.6BAlibaba0.6BTransformer (Qwen3 backbone)22
075Voxtral-Mini-3B-250722
076Voxtral-Small-24B-2507Mistral AI24BLarge multimodal LM with audio encoder22
077Canary-180M-FlashNVIDIA180MFastConformer-Small + TDT11
078Canary-1b-v211
079Conformer-CTC LargeNVIDIA / NeMo118MConformer (Conv + Attention) + CTC11
080CrisperWhispernyrahealth1.5BWhisper fine-tune with alignment11
081Distil-Whisper Large v211
082Distil-Whisper Large v311
083Distil-Whisper Large v3.511
084Distil-Whisper Medium (English)11
085Distil-Whisper Small (English)11
086ECAPA-TDNNGhent University~14.7MECAPA-TDNN (SE-Res2Net + attentive stats pooling)11
087Fairseq S2T (MuST-C)Meta AI~150MConformer encoder + transformer decoder11
088GLM-ASR-Nano-2512Zhipu AI2BGLM4 + audio encoder11
089Lite-whisper-large-v311
090Moshi ASR11
091Owsm_ctc_v4_1B11
092Parakeet-TDT-1.1BNVIDIA1.1BFastConformer (TDT)11
093Parakeet-ctc-1.1b11
094Parakeet-rnnt-1.1b11
095ResNet-34 (AM-Softmax, VoxCeleb2)Community~6MResNet-34 with AM-Softmax loss11
096SYMPHONY11
097Stt-2.6b-en11
098Wav2Vec 2.0 Base11
099Wav2Vec 2.0 Large (LS-960)11
100Whisper Medium (English)11