Codesota · Models1,357 models indexed · 104 match filter
Editorial · Models

Every model, measured.

Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.

Vendor:Areas overviewspeakleash · 253OpenAI · 85Google · 71Qwen · 52Alibaba · 47Anthropic · 44Microsoft · 35Meta · 30Mistral · 30DeepSeek · 28google · 19meta-llama · 19mistralai · 19Meta AI · 15CYFRAGOVPL · 14Zhipu AI · 13NVIDIA · 10SpeakLeash · 10internlm · 10xAI · 10ByteDance · 9Baidu · 8PLLuM · 8ibm-granite · 8microsoft · 8Amazon · 7Google DeepMind · 7MiniMax · 7Mistral AI · 7Remek · 7Shanghai AI Lab · 7allenai · 7utter-project · 7CohereForAI · 6Microsoft Research · 6Salesforce · 601-ai · 5Alibaba Cloud · 5Cohere · 5Moonshot AI · 5NousResearch · 5THUML · 5deepseek-ai · 5DeepMind · 4Facebook AI · 4IBM · 4Meituan · 4Stanford · 4THUDM · 4UC San Diego · 4VikParuchuri · 4gguf-iq · 4nvidia · 4openchat · 4tiiuae · 4Allen AI · 3BAAI · 3Du et al. · 3ForgeCode · 3Fudan University · 3IDEA Research · 3Liao et al. · 3Moonshot.AI · 3Nam Tuan Ly / NII · 3OPI-PG · 3OpenDataLab · 3ViCoS Lab Ljubljana · 3Xiaomi · 3Zhao et al. · 3gguf · 3gguf11bv30 · 3gguf7bv30 · 3upstage · 3+ 247 smaller vendors (291 models)
§ 01 · Speech models

104 models in Speech · page 1 of 3.

#ModelVendorParametersArchitectureSOTABenchmarksResults
001Mms-1b-fl102589
002Qwen3.5-Omni-Plus399
003Llama 3 (405B, Instruct)Meta299
004Wav2vec2-base-960h289
005Universal-1AssemblyAITransformer212
006Whisper Large v2OpenAI1.5BTransformer encoder-decoder11014
007Stt_en_fastconformer_ctc_large189
008Audio Flamingo 3177
009Phi-4-Multimodal 5.6B111
010SeamlessM4T v2 LargeMeta AI2.3BUnified multilingual/multimodal transformer (UnitY2)111
011WavLM Large (SV)Microsoft316MWavLM Large + ECAPA-TDNN head111
012wav2vec 2.0 Large (960h)Meta AI317MCNN feature encoder + Transformer912
013Asr-conformer-loquacious89
014Asr-wav2vec2-librispeech89
015Data2vec-audio-base-960h89
016Data2vec-audio-large-960h89
017Distil-large-v289
018Distil-large-v389
019Distil-medium.en89
020Distil-small.en89
021Granite Speech 3.3 2BIBM2BTransformer89
022Granite Speech 4.1 2BIBM2BTransformer (speech+text)89
023Hubert-large-ls960-ft89
024Hubert-xlarge-ls960-ft89
025Lite-whisper-large-v3-fast89
026Lite-whisper-large-v3-turbo-acc89
027Mms-1b-all89
028Moonshine-base89
029Moonshine-streaming-tiny89
030Moonshine-tiny89
031Parakeet-ctc-0.6b89
032Parakeet-rnnt-0.6b89
033Parakeet-tdt_ctc-110m89
034Phi-4 Multimodal InstructMicrosoft6BPhi-4 multimodal89
035SYMPHONY-ASR89
036Stt_en_conformer_ctc_large89
037Stt_en_conformer_ctc_small89
038Stt_en_fastconformer_transducer_large89
039VibeVoice-ASR-HF89
040Voxtral-Mini-4B-Realtime-2602Mistral AI4BTransformer ASR89
041Wav2vec2-conformer-rel-pos-large-960h-ft89
042Wav2vec2-conformer-rope-large-960h-ft89
043Wav2vec2-large-960h-lv60-self89
044Wav2vec2-large-robust-ft-libri-960h89
045Whisper Large89
046Whisper Large v3OpenAI1.5BTransformer encoder-decoder69
047Whisper Large v3 TurboOpenAI809MTransformer encoder-decoder (pruned decoder)89
048Whisper-base.en89
049Whisper-medium.en89
050Whisper-small.en89