Piper vs Kokoro: Kokoro wins quality, Piper wins appliances.

Direct answer: choose Kokoro when people will judge the voice. Choose Piper when the system needs simple offline speech, predictable packaging, and many practical voices. The important caveat: CodeSOTA has measured Kokoro on hard text, but Piper still needs the same run before we call the quality gap quantified.

Measured intelligibility TTS model guide Blind Elo study

§ 01 · Short decision table

The winner changes with the job.

Question	Winner	Why
Local voice quality default	Kokoro	More modern, more listenable, and CodeSOTA has an artifact-backed Kokoro run.
Small appliance / offline fleet	Piper	Simpler engine path, broad voice catalog, and a long Home Assistant style deployment history.
Voice-agent demo people will judge	Kokoro	A better first impression matters when the voice is part of the product surface.
Status prompts, kiosks, alerts	Piper	Predictable utility speech is enough when the content matters more than the speaker.
Evidence confidence on this page	Kokoro	CodeSOTA has measured Kokoro; Piper still needs the same hard-text and latency run.

§ 02 · Evidence ledger

What is measured, and what is still only a deployment claim.

Claim	Result	Tier	How to use it
Kokoro hard-text intelligibility	30 prompts, WER 15.6%, CER 6.8%, entity accuracy 66.7%	CodeSOTA measured	Use this as the current local evidence floor, not a final universal ranking.
Kokoro latency on CodeSOTA run	M2 Max, ONNX Runtime, p50 first audio 855 ms, p95 2123 ms	CodeSOTA measured	Good enough for many local prototypes; retest on Raspberry Pi, N100, and target cloud CPU before shipping.
Kokoro blind preference	234 pairwise votes across 8 model families; Kokoro placed 7th overall, 3rd on number-heavy prompts, and beat Gradium 10-7 head-to-head.	CodeSOTA measured	Naturalness is not the same as fidelity; Kokoro is strong for tiny local TTS, not the overall winner against larger hosted systems.
Piper deployment surface	Fast local neural TTS engine, current Open Home Foundation fork, CLI/server/Python/C/C++ paths, and a 35-language MIT voice repository.	Primary project sources	Pick it when operational simplicity, language availability, and offline repeatability beat expressiveness.
Piper vs Kokoro same-prompt quality	Missing from CodeSOTA today: no Piper run on the same 30 hard prompts, no blind Piper-vs-Kokoro vote table, no p95 target-device latency row.	Gap	Do not write 'Piper is worse' as a measured claim until Piper has the same harness.

Kokoro sample Kokoro spectrogram Run config

§ 03 · Model facts

The registry view.

Model	Params	License	Deployment	CodeSOTA tier	Best read
Kokoro v1.0	82M	Apache-2.0	local, edge	codesota measured	Small open-weight voice that sounds better than its size suggests.
Piper	~20M	MIT	local, edge	community reported	Operationally boring in a useful way: good for offline voice plumbing.

§ 04 · Workload matrix

Pick by where the voice fails.

Workload	Pick	Reason
New local English voice prototype	Kokoro	Fast to evaluate, stronger default voice quality, Apache-2.0 model weights.
Home appliance, kiosk, embedded prompt	Piper	Small local engine, deterministic output, practical voice catalog.
Long listening sessions	Benchmark both	Fatigue can flip the decision; run 20-30 minute listening tests, not only one-sentence demos.
Raspberry Pi / constrained CPU	Piper first, Kokoro second	Piper was built around this class of deployment; Kokoro may still win if the device can handle the voice quality target.
Public voice assistant demo	Kokoro	Users notice prosody and naturalness more than the ops stack during a demo.
Voice cloning or multilingual style control	Neither as default	Compare XTTS, F5-TTS, Chatterbox, and hosted APIs; Piper and Kokoro are not cloning-first choices.

§ 05 · Production bake-off

The minimum benchmark before a real decision.

Check	Run this
Prompt set	Use the same 30 hard-text prompts plus 30 conversational turns and 10 long-form paragraphs.
Objective fidelity	Score WER, CER, critical entity accuracy, URL/date/number failures, omission and repetition rate.
Latency	Measure cold start, p50/p95 first audio, real-time factor, peak RSS, and CPU load on the real target box.
Preference	Run blind A/B votes with volume matched, randomized order, same text, and at least 30 judgments per pair.
Production fit	Check license, voice availability, install size, monitoring path, streamability, and whether the voice is tolerable after repeated listening.

§ 06 · Sources

Primary references used for non-CodeSOTA facts.

Source	Used for	Link
Kokoro model card	82M model, Apache-2.0, Hugging Face files	Open
Kokoro inference library	Open-weight 82M TTS model and install path	Open
Piper engine	Current Open Home Foundation fork, local engine and APIs	Open
Piper voices	MIT voice repository covering 35 languages	Open