Speech procurement · TTS vendors · 2026

All TTS vendors tracked by CodeSOTA.

A practical vendor directory for text-to-speech: hosted APIs for shipping fast, low-latency voice-agent systems, voice-cloning platforms, and open-source models you can run or fine-tune yourself.

01 · Vendors

Hosted APIs and open models in one table.

Start with API vendors when operations, streaming and voice libraries matter. Start with open models when local deployment, licensing and marginal cost matter.

VendorTypeTracked model(s)Best fitBest MOSLatestProfileLink
ElevenLabsAPIElevenLabs Turbo v2.5, ElevenLabs Flash v2.5voice cloning4.82025ViewOpen
CartesiaAPICartesia Sonic 2real-time voice agents4.72025ViewOpen
GoogleAPIGemini 2.5 Pro TTS, Gemini 2.5 Flash TTS, Google Chirp 3 HDreal-time voice agents4.72025ViewOpen
OpenAIAPIOpenAI TTS HDgeneral product TTS4.72023ViewOpen
PlayHTAPIPlayHT 3.0voice cloning4.62025ViewOpen
GradiumAPIGradium TTSreal-time voice agents4.42026ViewOpen
Sesameopen sourceSesame CSMself-hosted TTS4.72025ViewOpen
Canopy Labsopen sourceOrpheus TTSself-hosted TTS4.62025ViewOpen
Fish Audioopen sourceFish Audio S2 Pro, Fish Speech 1.5real-time voice agents4.62026ViewOpen
Coquiopen sourceXTTS v2voice cloning4.52024ViewOpen
Hexgradopen sourceKokoro v1.0self-hosted TTS4.52025ViewOpen
Shanghai AI Labopen sourceF5-TTSvoice cloning4.42024ViewOpen
Nari Labsopen sourceDia 1.6Bself-hosted TTS4.32025ViewOpen
SparkAudioopen sourceSpark-TTSmultilingual speech4.32025ViewOpen
Supertoneopen sourceSupertonic 3real-time voice agents4.22026ViewOpen
Hugging Faceopen sourceParler-TTSself-hosted TTS4.12025ViewOpen
Rhasspyopen sourcePiperreal-time voice agents3.62023ViewOpen
Fig 1 · MOS values are catalog signals, not a universal procurement score. Validate on your own prompts, names, numbers, languages, latency budget and consent requirements.
02 · Shortlist

Pick by procurement shape.

Real-time agents

Cartesia, ElevenLabs Flash, Gemini Flash, Gradium

Prioritize TTFB, streaming stability and interruption handling.

Voice cloning

ElevenLabs, PlayHT, Google Chirp 3, Gradium, Coqui

Check consent, similarity, speaker consistency and policy terms.

Open deployment

Sesame, Kokoro, Orpheus, F5-TTS, Dia, Fish Speech

Validate license, hardware footprint, language coverage and serving stack.

Simple hosted TTS

OpenAI, Google, ElevenLabs

Use when integration speed and vendor support beat fine-grained control.

03 · Company profiles

What each vendor is for.

These profiles translate the model table into procurement language: positioning, buyer fit, practical strengths and the main thing to verify before adoption.

voice cloning
API

ElevenLabs

Commercial TTS quality leader with a large voice library and cloning workflow.

Best buyer
Creators, media products and apps that need polished voices quickly.
Strengths
top MOSvoice librarycloning
Verify
Cost, data policy, latency tier and voice-rights workflow matter at scale.
Tracked models
ElevenLabs Turbo v2.5 (4.8 MOS), ElevenLabs Flash v2.5 (4.6 MOS)
real-time voice agents
API

Cartesia

Low-latency hosted TTS for voice agents and conversational interfaces.

Best buyer
Product teams building real-time phone, agent, avatar or interruptible voice systems.
Strengths
fast first bytestreamingagent fit
Verify
Benchmark naturalness is only part of the decision; test interruption handling and regional latency.
Tracked models
Cartesia Sonic 2 (4.7 MOS)
real-time voice agents
API

Google

Hyperscaler TTS with Gemini-native audio and Cloud Text-to-Speech coverage.

Best buyer
Enterprises already on Google Cloud or products needing many locales and managed operations.
Strengths
locale coveragecloud operationsprompted style
Verify
Quality varies by voice, language and product family; compare Chirp, Gemini and classic Cloud TTS separately.
Tracked models
Gemini 2.5 Pro TTS (4.7 MOS), Gemini 2.5 Flash TTS (4.5 MOS), Google Chirp 3 HD (4.4 MOS)
general product TTS
API

OpenAI

Simple hosted TTS API for teams already using OpenAI infrastructure.

Best buyer
Developers who want quick integration, stable docs and a small set of built-in voices.
Strengths
simple APImanaged servicedeveloper adoption
Verify
Voice range, style control and cloning depth are narrower than specialist TTS platforms.
Tracked models
OpenAI TTS HD (4.7 MOS)
voice cloning
API

PlayHT

Hosted voice generation platform focused on cloning and creator workflows.

Best buyer
Media, marketing and product teams that need cloned or branded voices without self-hosting.
Strengths
voice cloningemotion controlhosted workflow
Verify
Compare consent workflow, long-form consistency and cost before large-volume use.
Tracked models
PlayHT 3.0 (4.6 MOS)
real-time voice agents
API

Gradium

Hosted TTS API tracked by CodeSOTA for intelligibility, UTMOS and first-byte latency.

Best buyer
Voice-agent teams that care about streaming, cloning and measured information preservation.
Strengths
streamingvoice cloningCodeSOTA measured
Verify
Run your own hard-prompt set for names, numbers, URLs and domain vocabulary.
Tracked models
Gradium TTS (4.4 MOS)
self-hosted TTS
open source

Sesame

Open conversational speech model with expressive dialogue quality.

Best buyer
Teams evaluating open alternatives to commercial conversational TTS.
Strengths
conversationexpressivenessopen source
Verify
Measure deployment latency and license fit before treating it as a hosted-vendor replacement.
Tracked models
Sesame CSM (4.7 MOS)
self-hosted TTS
open source

Canopy Labs

Open LLM-style speech generation through Orpheus TTS.

Best buyer
Teams that want an expressive open model they can fine-tune or inspect.
Strengths
emotion tagscustom fine-tuningopen deployment
Verify
Operational burden is on you: serving, latency tuning, voice QA and license review.
Tracked models
Orpheus TTS (4.6 MOS)
real-time voice agents
open source

Fish Audio

Open multilingual speech generation with strong CJK coverage.

Best buyer
Teams experimenting with self-hosted multilingual or regional-language TTS.
Strengths
multilingualCJK supportopen weights
Verify
Validate English naturalness, serving cost and license terms against your exact product use.
Tracked models
Fish Audio S2 Pro (4.6 MOS), Fish Speech 1.5 (4.4 MOS)
voice cloning
open source

Coqui

Open-source TTS toolkit with XTTS voice cloning as the practical draw.

Best buyer
Developers who need self-hosted multilingual cloning and can manage model infrastructure.
Strengths
voice cloning17 languageslocal control
Verify
Project and license status need checking before commercial deployment.
Tracked models
XTTS v2 (4.5 MOS)
self-hosted TTS
open source

Hexgrad

Tiny open TTS through Kokoro: strong quality for an 82M parameter model.

Best buyer
Builders who need lightweight local TTS, edge experiments or cheap batch generation.
Strengths
small modelCPU friendlyApache 2.0
Verify
Not a full hosted vendor; you still need serving, voice management and QA.
Tracked models
Kokoro v1.0 (4.5 MOS)
voice cloning
open source

Shanghai AI Lab

Research-grade open zero-shot cloning through F5-TTS.

Best buyer
Research and engineering teams testing fast open cloning and adaptation.
Strengths
flow matchingzero-shot cloningresearch code
Verify
Needs production hardening: model serving, monitoring, safety and voice-rights controls.
Tracked models
F5-TTS (4.4 MOS)
self-hosted TTS
open source

Nari Labs

Dialogue-focused open TTS with non-verbal cues through Dia.

Best buyer
Teams prototyping expressive dialogue, characters or conversational audio.
Strengths
dialoguenon-verbal cuesopen model
Verify
Treat as a model profile, not a managed API; validate consistency over long sessions.
Tracked models
Dia 1.6B (4.3 MOS)
multilingual speech
open source

SparkAudio

Open controllable TTS with explicit pitch, speed and emotion attributes.

Best buyer
Teams that need controllable generation experiments rather than a black-box API.
Strengths
attribute controlmultilingualopen model
Verify
Validate actual controllability and stability across speakers before product use.
Tracked models
Spark-TTS (4.3 MOS)
real-time voice agents
open source

Supertone

Compact local multilingual TTS through Supertonic 3.

Best buyer
Teams testing on-device or edge speech generation with a small open-weight model.
Strengths
small footprintlocal inferencemultilingual
Verify
Add same-prompt listening samples and hard-text runs before ranking it against measured rows.
Tracked models
Supertonic 3 (4.2 MOS)
self-hosted TTS
open source

Hugging Face

Open prompt-controlled TTS through Parler-TTS and the wider HF ecosystem.

Best buyer
Research and prototype teams that want open training artifacts and model iteration.
Strengths
prompted styleopen toolingecosystem
Verify
Production reliability depends on your chosen hosting path and model variant.
Tracked models
Parler-TTS (4.1 MOS)
real-time voice agents
open source

Rhasspy

Lightweight local TTS through Piper for embedded and offline systems.

Best buyer
Home assistant, robotics, kiosk and edge teams that need fast local speech.
Strengths
offlineRaspberry Pi fitlow latency
Verify
Naturalness trails frontier hosted voices; choose it for control and footprint.
Tracked models
Piper (3.6 MOS)