Every model, measured.

Start with a research area, drill into a vendor, or page through the full index. Only models with at least one benchmark score appear — a model without a recorded score can’t be ranked.

Vendor:Areas overview speakleash · 253 OpenAI · 85 Google · 71 Qwen · 52 Alibaba · 47 Anthropic · 44 Microsoft · 35 Meta · 30 Mistral · 30 DeepSeek · 28 google · 19 meta-llama · 19 mistralai · 19 Meta AI · 15 CYFRAGOVPL · 14 Zhipu AI · 13 NVIDIA · 10 SpeakLeash · 10 internlm · 10 xAI · 10 ByteDance · 9 Baidu · 8 PLLuM · 8 ibm-granite · 8 microsoft · 8 Amazon · 7 Google DeepMind · 7 MiniMax · 7 Mistral AI · 7 Remek · 7 Shanghai AI Lab · 7 allenai · 7 utter-project · 7 CohereForAI · 6 Microsoft Research · 6 Salesforce · 6 01-ai · 5 Alibaba Cloud · 5 Cohere · 5 Moonshot AI · 5 NousResearch · 5 THUML · 5 deepseek-ai · 5 DeepMind · 4 Facebook AI · 4 IBM · 4 Meituan · 4 Stanford · 4 THUDM · 4 UC San Diego · 4 VikParuchuri · 4 gguf-iq · 4 nvidia · 4 openchat · 4 tiiuae · 4 Allen AI · 3 BAAI · 3 Du et al. · 3 ForgeCode · 3 Fudan University · 3 IDEA Research · 3 Liao et al. · 3 Moonshot.AI · 3 Nam Tuan Ly / NII · 3 OPI-PG · 3 OpenDataLab · 3 ViCoS Lab Ljubljana · 3 Xiaomi · 3 Zhao et al. · 3 gguf · 3 gguf11bv30 · 3 gguf7bv30 · 3 upstage · 3+ 247 smaller vendors (291 models)

§ 01 · Speech models

104 models in Speech · page 1 of 3.

#	Model	Vendor	Parameters	Architecture	SOTA	Benchmarks	Results
001	Mms-1b-fl102	—	—	—	5	8	9
002	Qwen3.5-Omni-Plus	—	—	—	3	9	9
003	Llama 3 (405B, Instruct)	Meta	—	—	2	9	9
004	Wav2vec2-base-960h	—	—	—	2	8	9
005	Universal-1	AssemblyAI	—	Transformer	2	1	2
006	Whisper Large v2	OpenAI	1.5B	Transformer encoder-decoder	1	10	14
007	Stt_en_fastconformer_ctc_large	—	—	—	1	8	9
008	Audio Flamingo 3	—	—	—	1	7	7
009	Phi-4-Multimodal 5.6B	—	—	—	1	1	1
010	SeamlessM4T v2 Large	Meta AI	2.3B	Unified multilingual/multimodal transformer (UnitY2)	1	1	1
011	WavLM Large (SV)	Microsoft	316M	WavLM Large + ECAPA-TDNN head	1	1	1
012	wav2vec 2.0 Large (960h)	Meta AI	317M	CNN feature encoder + Transformer	—	9	12
013	Asr-conformer-loquacious	—	—	—	—	8	9
014	Asr-wav2vec2-librispeech	—	—	—	—	8	9
015	Data2vec-audio-base-960h	—	—	—	—	8	9
016	Data2vec-audio-large-960h	—	—	—	—	8	9
017	Distil-large-v2	—	—	—	—	8	9
018	Distil-large-v3	—	—	—	—	8	9
019	Distil-medium.en	—	—	—	—	8	9
020	Distil-small.en	—	—	—	—	8	9
021	Granite Speech 3.3 2B	IBM	2B	Transformer	—	8	9
022	Granite Speech 4.1 2B	IBM	2B	Transformer (speech+text)	—	8	9
023	Hubert-large-ls960-ft	—	—	—	—	8	9
024	Hubert-xlarge-ls960-ft	—	—	—	—	8	9
025	Lite-whisper-large-v3-fast	—	—	—	—	8	9
026	Lite-whisper-large-v3-turbo-acc	—	—	—	—	8	9
027	Mms-1b-all	—	—	—	—	8	9
028	Moonshine-base	—	—	—	—	8	9
029	Moonshine-streaming-tiny	—	—	—	—	8	9
030	Moonshine-tiny	—	—	—	—	8	9
031	Parakeet-ctc-0.6b	—	—	—	—	8	9
032	Parakeet-rnnt-0.6b	—	—	—	—	8	9
033	Parakeet-tdt_ctc-110m	—	—	—	—	8	9
034	Phi-4 Multimodal Instruct	Microsoft	6B	Phi-4 multimodal	—	8	9
035	SYMPHONY-ASR	—	—	—	—	8	9
036	Stt_en_conformer_ctc_large	—	—	—	—	8	9
037	Stt_en_conformer_ctc_small	—	—	—	—	8	9
038	Stt_en_fastconformer_transducer_large	—	—	—	—	8	9
039	VibeVoice-ASR-HF	—	—	—	—	8	9
040	Voxtral-Mini-4B-Realtime-2602	Mistral AI	4B	Transformer ASR	—	8	9
041	Wav2vec2-conformer-rel-pos-large-960h-ft	—	—	—	—	8	9
042	Wav2vec2-conformer-rope-large-960h-ft	—	—	—	—	8	9
043	Wav2vec2-large-960h-lv60-self	—	—	—	—	8	9
044	Wav2vec2-large-robust-ft-libri-960h	—	—	—	—	8	9
045	Whisper Large	—	—	—	—	8	9
046	Whisper Large v3	OpenAI	1.5B	Transformer encoder-decoder	—	6	9
047	Whisper Large v3 Turbo	OpenAI	809M	Transformer encoder-decoder (pruned decoder)	—	8	9
048	Whisper-base.en	—	—	—	—	8	9
049	Whisper-medium.en	—	—	—	—	8	9
050	Whisper-small.en	—	—	—	—	8	9