Cartesia, ElevenLabs Flash, Gemini Flash, Gradium
Prioritize TTFB, streaming stability and interruption handling.
A practical vendor directory for text-to-speech: hosted APIs for shipping fast, low-latency voice-agent systems, voice-cloning platforms, and open-source models you can run or fine-tune yourself.
Start with API vendors when operations, streaming and voice libraries matter. Start with open models when local deployment, licensing and marginal cost matter.
| Vendor | Type | Tracked model(s) | Best fit | Best MOS | Latest | Profile | Link |
|---|---|---|---|---|---|---|---|
| ElevenLabs | API | ElevenLabs Turbo v2.5, ElevenLabs Flash v2.5 | voice cloning | 4.8 | 2025 | View | Open |
| Cartesia | API | Cartesia Sonic 2 | real-time voice agents | 4.7 | 2025 | View | Open |
| API | Gemini 2.5 Pro TTS, Gemini 2.5 Flash TTS, Google Chirp 3 HD | real-time voice agents | 4.7 | 2025 | View | Open | |
| OpenAI | API | OpenAI TTS HD | general product TTS | 4.7 | 2023 | View | Open |
| PlayHT | API | PlayHT 3.0 | voice cloning | 4.6 | 2025 | View | Open |
| Gradium | API | Gradium TTS | real-time voice agents | 4.4 | 2026 | View | Open |
| Sesame | open source | Sesame CSM | self-hosted TTS | 4.7 | 2025 | View | Open |
| Canopy Labs | open source | Orpheus TTS | self-hosted TTS | 4.6 | 2025 | View | Open |
| Fish Audio | open source | Fish Audio S2 Pro, Fish Speech 1.5 | real-time voice agents | 4.6 | 2026 | View | Open |
| Coqui | open source | XTTS v2 | voice cloning | 4.5 | 2024 | View | Open |
| Hexgrad | open source | Kokoro v1.0 | self-hosted TTS | 4.5 | 2025 | View | Open |
| Shanghai AI Lab | open source | F5-TTS | voice cloning | 4.4 | 2024 | View | Open |
| Nari Labs | open source | Dia 1.6B | self-hosted TTS | 4.3 | 2025 | View | Open |
| SparkAudio | open source | Spark-TTS | multilingual speech | 4.3 | 2025 | View | Open |
| Supertone | open source | Supertonic 3 | real-time voice agents | 4.2 | 2026 | View | Open |
| Hugging Face | open source | Parler-TTS | self-hosted TTS | 4.1 | 2025 | View | Open |
| Rhasspy | open source | Piper | real-time voice agents | 3.6 | 2023 | View | Open |
Prioritize TTFB, streaming stability and interruption handling.
Check consent, similarity, speaker consistency and policy terms.
Validate license, hardware footprint, language coverage and serving stack.
Use when integration speed and vendor support beat fine-grained control.
These profiles translate the model table into procurement language: positioning, buyer fit, practical strengths and the main thing to verify before adoption.
Commercial TTS quality leader with a large voice library and cloning workflow.
Low-latency hosted TTS for voice agents and conversational interfaces.
Hyperscaler TTS with Gemini-native audio and Cloud Text-to-Speech coverage.
Simple hosted TTS API for teams already using OpenAI infrastructure.
Hosted voice generation platform focused on cloning and creator workflows.
Hosted TTS API tracked by CodeSOTA for intelligibility, UTMOS and first-byte latency.
Open conversational speech model with expressive dialogue quality.
Open LLM-style speech generation through Orpheus TTS.
Open multilingual speech generation with strong CJK coverage.
Open-source TTS toolkit with XTTS voice cloning as the practical draw.
Tiny open TTS through Kokoro: strong quality for an 82M parameter model.
Research-grade open zero-shot cloning through F5-TTS.
Dialogue-focused open TTS with non-verbal cues through Dia.
Open controllable TTS with explicit pitch, speed and emotion attributes.
Compact local multilingual TTS through Supertonic 3.
Open prompt-controlled TTS through Parler-TTS and the wider HF ecosystem.
Lightweight local TTS through Piper for embedded and offline systems.