Speech procurement · TTS vendors · 2026

All TTS vendors tracked by CodeSOTA.

A practical vendor directory for text-to-speech: hosted APIs for shipping fast, low-latency voice-agent systems, voice-cloning platforms, and open-source models you can run or fine-tune yourself.

View vendors Company profiles TTS registry Speech hub

01 · Vendors

Hosted APIs and open models in one table.

Start with API vendors when operations, streaming and voice libraries matter. Start with open models when local deployment, licensing and marginal cost matter.

Vendor	Type	Tracked model(s)	Best fit	Best MOS	Latest	Profile	Link
ElevenLabs	API	ElevenLabs Turbo v2.5, ElevenLabs Flash v2.5	voice cloning	4.8	2025	View	Open
Cartesia	API	Cartesia Sonic 2	real-time voice agents	4.7	2025	View	Open
Google	API	Gemini 2.5 Pro TTS, Gemini 2.5 Flash TTS, Google Chirp 3 HD	real-time voice agents	4.7	2025	View	Open
OpenAI	API	OpenAI TTS HD	general product TTS	4.7	2023	View	Open
PlayHT	API	PlayHT 3.0	voice cloning	4.6	2025	View	Open
Gradium	API	Gradium TTS	real-time voice agents	4.4	2026	View	Open
Sesame	open source	Sesame CSM	self-hosted TTS	4.7	2025	View	Open
Canopy Labs	open source	Orpheus TTS	self-hosted TTS	4.6	2025	View	Open
Fish Audio	open source	Fish Audio S2 Pro, Fish Speech 1.5	real-time voice agents	4.6	2026	View	Open
Coqui	open source	XTTS v2	voice cloning	4.5	2024	View	Open
Hexgrad	open source	Kokoro v1.0	self-hosted TTS	4.5	2025	View	Open
Shanghai AI Lab	open source	F5-TTS	voice cloning	4.4	2024	View	Open
Nari Labs	open source	Dia 1.6B	self-hosted TTS	4.3	2025	View	Open
SparkAudio	open source	Spark-TTS	multilingual speech	4.3	2025	View	Open
Supertone	open source	Supertonic 3	real-time voice agents	4.2	2026	View	Open
Hugging Face	open source	Parler-TTS	self-hosted TTS	4.1	2025	View	Open
Rhasspy	open source	Piper	real-time voice agents	3.6	2023	View	Open

Fig 1 · MOS values are catalog signals, not a universal procurement score. Validate on your own prompts, names, numbers, languages, latency budget and consent requirements.

02 · Shortlist

Pick by procurement shape.

Real-time agents

Cartesia, ElevenLabs Flash, Gemini Flash, Gradium

Prioritize TTFB, streaming stability and interruption handling.

Voice cloning

ElevenLabs, PlayHT, Google Chirp 3, Gradium, Coqui

Check consent, similarity, speaker consistency and policy terms.

Open deployment

Sesame, Kokoro, Orpheus, F5-TTS, Dia, Fish Speech

Validate license, hardware footprint, language coverage and serving stack.

Simple hosted TTS

OpenAI, Google, ElevenLabs

Use when integration speed and vendor support beat fine-grained control.

03 · Company profiles

What each vendor is for.

These profiles translate the model table into procurement language: positioning, buyer fit, practical strengths and the main thing to verify before adoption.

voice cloning

API

ElevenLabs

Commercial TTS quality leader with a large voice library and cloning workflow.

Best buyer: Creators, media products and apps that need polished voices quickly.
Strengths: top MOSvoice librarycloning
Verify: Cost, data policy, latency tier and voice-rights workflow matter at scale.
Tracked models: ElevenLabs Turbo v2.5 (4.8 MOS), ElevenLabs Flash v2.5 (4.6 MOS)

4.8Visit vendor →

real-time voice agents

API

Cartesia

Low-latency hosted TTS for voice agents and conversational interfaces.

Best buyer: Product teams building real-time phone, agent, avatar or interruptible voice systems.
Strengths: fast first bytestreamingagent fit
Verify: Benchmark naturalness is only part of the decision; test interruption handling and regional latency.
Tracked models: Cartesia Sonic 2 (4.7 MOS)

4.7Visit vendor →

real-time voice agents

API

Google

Hyperscaler TTS with Gemini-native audio and Cloud Text-to-Speech coverage.

Best buyer: Enterprises already on Google Cloud or products needing many locales and managed operations.
Strengths: locale coveragecloud operationsprompted style
Verify: Quality varies by voice, language and product family; compare Chirp, Gemini and classic Cloud TTS separately.
Tracked models: Gemini 2.5 Pro TTS (4.7 MOS), Gemini 2.5 Flash TTS (4.5 MOS), Google Chirp 3 HD (4.4 MOS)

4.7Visit vendor →

general product TTS

API

OpenAI

Simple hosted TTS API for teams already using OpenAI infrastructure.

Best buyer: Developers who want quick integration, stable docs and a small set of built-in voices.
Strengths: simple APImanaged servicedeveloper adoption
Verify: Voice range, style control and cloning depth are narrower than specialist TTS platforms.
Tracked models: OpenAI TTS HD (4.7 MOS)

4.7Visit vendor →

voice cloning

API

PlayHT

Hosted voice generation platform focused on cloning and creator workflows.

Best buyer: Media, marketing and product teams that need cloned or branded voices without self-hosting.
Strengths: voice cloningemotion controlhosted workflow
Verify: Compare consent workflow, long-form consistency and cost before large-volume use.
Tracked models: PlayHT 3.0 (4.6 MOS)

4.6Visit vendor →

real-time voice agents

API

Gradium

Hosted TTS API tracked by CodeSOTA for intelligibility, UTMOS and first-byte latency.

Best buyer: Voice-agent teams that care about streaming, cloning and measured information preservation.
Strengths: streamingvoice cloningCodeSOTA measured
Verify: Run your own hard-prompt set for names, numbers, URLs and domain vocabulary.
Tracked models: Gradium TTS (4.4 MOS)

4.4Visit vendor →

self-hosted TTS

open source

Sesame

Open conversational speech model with expressive dialogue quality.

Best buyer: Teams evaluating open alternatives to commercial conversational TTS.
Strengths: conversationexpressivenessopen source
Verify: Measure deployment latency and license fit before treating it as a hosted-vendor replacement.
Tracked models: Sesame CSM (4.7 MOS)

4.7Visit vendor →

self-hosted TTS

open source

Canopy Labs

Open LLM-style speech generation through Orpheus TTS.

Best buyer: Teams that want an expressive open model they can fine-tune or inspect.
Strengths: emotion tagscustom fine-tuningopen deployment
Verify: Operational burden is on you: serving, latency tuning, voice QA and license review.
Tracked models: Orpheus TTS (4.6 MOS)

4.6Visit vendor →

real-time voice agents

open source

Fish Audio

Open multilingual speech generation with strong CJK coverage.

Best buyer: Teams experimenting with self-hosted multilingual or regional-language TTS.
Strengths: multilingualCJK supportopen weights
Verify: Validate English naturalness, serving cost and license terms against your exact product use.
Tracked models: Fish Audio S2 Pro (4.6 MOS), Fish Speech 1.5 (4.4 MOS)

4.6Visit vendor →

voice cloning

open source

Coqui

Open-source TTS toolkit with XTTS voice cloning as the practical draw.

Best buyer: Developers who need self-hosted multilingual cloning and can manage model infrastructure.
Strengths: voice cloning17 languageslocal control
Verify: Project and license status need checking before commercial deployment.
Tracked models: XTTS v2 (4.5 MOS)

4.5Visit vendor →

self-hosted TTS

open source

Hexgrad

Tiny open TTS through Kokoro: strong quality for an 82M parameter model.

Best buyer: Builders who need lightweight local TTS, edge experiments or cheap batch generation.
Strengths: small modelCPU friendlyApache 2.0
Verify: Not a full hosted vendor; you still need serving, voice management and QA.
Tracked models: Kokoro v1.0 (4.5 MOS)

4.5Visit vendor →

voice cloning

open source

Shanghai AI Lab

Research-grade open zero-shot cloning through F5-TTS.

Best buyer: Research and engineering teams testing fast open cloning and adaptation.
Strengths: flow matchingzero-shot cloningresearch code
Verify: Needs production hardening: model serving, monitoring, safety and voice-rights controls.
Tracked models: F5-TTS (4.4 MOS)

4.4Visit vendor →

self-hosted TTS

open source

Nari Labs

Dialogue-focused open TTS with non-verbal cues through Dia.

Best buyer: Teams prototyping expressive dialogue, characters or conversational audio.
Strengths: dialoguenon-verbal cuesopen model
Verify: Treat as a model profile, not a managed API; validate consistency over long sessions.
Tracked models: Dia 1.6B (4.3 MOS)

4.3Visit vendor →

multilingual speech

open source

SparkAudio

Open controllable TTS with explicit pitch, speed and emotion attributes.

Best buyer: Teams that need controllable generation experiments rather than a black-box API.
Strengths: attribute controlmultilingualopen model
Verify: Validate actual controllability and stability across speakers before product use.
Tracked models: Spark-TTS (4.3 MOS)

4.3Visit vendor →

real-time voice agents

open source

Supertone

Compact local multilingual TTS through Supertonic 3.

Best buyer: Teams testing on-device or edge speech generation with a small open-weight model.
Strengths: small footprintlocal inferencemultilingual
Verify: Add same-prompt listening samples and hard-text runs before ranking it against measured rows.
Tracked models: Supertonic 3 (4.2 MOS)

4.2Visit vendor →

self-hosted TTS

open source

Hugging Face

Open prompt-controlled TTS through Parler-TTS and the wider HF ecosystem.

Best buyer: Research and prototype teams that want open training artifacts and model iteration.
Strengths: prompted styleopen toolingecosystem
Verify: Production reliability depends on your chosen hosting path and model variant.
Tracked models: Parler-TTS (4.1 MOS)

4.1Visit vendor →

real-time voice agents

open source

Rhasspy

Lightweight local TTS through Piper for embedded and offline systems.

Best buyer: Home assistant, robotics, kiosk and edge teams that need fast local speech.
Strengths: offlineRaspberry Pi fitlow latency
Verify: Naturalness trails frontier hosted voices; choose it for control and footprint.
Tracked models: Piper (3.6 MOS)

3.6Visit vendor →