CodeSOTA · Text-to-speech · Piper vs Kokoro
§ 00 · Direct answer

Piper vs Kokoro: Kokoro wins quality, Piper wins appliances.

Direct answer: choose Kokoro when people will judge the voice. Choose Piper when the system needs simple offline speech, predictable packaging, and many practical voices. The important caveat: CodeSOTA has measured Kokoro on hard text, but Piper still needs the same run before we call the quality gap quantified.

§ 01 · Short decision table

The winner changes with the job.

QuestionWinnerWhy
Local voice quality defaultKokoroMore modern, more listenable, and CodeSOTA has an artifact-backed Kokoro run.
Small appliance / offline fleetPiperSimpler engine path, broad voice catalog, and a long Home Assistant style deployment history.
Voice-agent demo people will judgeKokoroA better first impression matters when the voice is part of the product surface.
Status prompts, kiosks, alertsPiperPredictable utility speech is enough when the content matters more than the speaker.
Evidence confidence on this pageKokoroCodeSOTA has measured Kokoro; Piper still needs the same hard-text and latency run.
§ 02 · Evidence ledger

What is measured, and what is still only a deployment claim.

ClaimResultTierHow to use it
Kokoro hard-text intelligibility30 prompts, WER 15.6%, CER 6.8%, entity accuracy 66.7%CodeSOTA measuredUse this as the current local evidence floor, not a final universal ranking.
Kokoro latency on CodeSOTA runM2 Max, ONNX Runtime, p50 first audio 855 ms, p95 2123 msCodeSOTA measuredGood enough for many local prototypes; retest on Raspberry Pi, N100, and target cloud CPU before shipping.
Kokoro blind preference234 pairwise votes across 8 model families; Kokoro placed 7th overall, 3rd on number-heavy prompts, and beat Gradium 10-7 head-to-head.CodeSOTA measuredNaturalness is not the same as fidelity; Kokoro is strong for tiny local TTS, not the overall winner against larger hosted systems.
Piper deployment surfaceFast local neural TTS engine, current Open Home Foundation fork, CLI/server/Python/C/C++ paths, and a 35-language MIT voice repository.Primary project sourcesPick it when operational simplicity, language availability, and offline repeatability beat expressiveness.
Piper vs Kokoro same-prompt qualityMissing from CodeSOTA today: no Piper run on the same 30 hard prompts, no blind Piper-vs-Kokoro vote table, no p95 target-device latency row.GapDo not write 'Piper is worse' as a measured claim until Piper has the same harness.
Kokoro sampleKokoro spectrogramRun config
§ 03 · Model facts

The registry view.

ModelParamsLicenseDeploymentCodeSOTA tierBest read
Kokoro v1.082MApache-2.0local, edgecodesota measuredSmall open-weight voice that sounds better than its size suggests.
Piper~20MMITlocal, edgecommunity reportedOperationally boring in a useful way: good for offline voice plumbing.
§ 04 · Workload matrix

Pick by where the voice fails.

WorkloadPickReason
New local English voice prototypeKokoroFast to evaluate, stronger default voice quality, Apache-2.0 model weights.
Home appliance, kiosk, embedded promptPiperSmall local engine, deterministic output, practical voice catalog.
Long listening sessionsBenchmark bothFatigue can flip the decision; run 20-30 minute listening tests, not only one-sentence demos.
Raspberry Pi / constrained CPUPiper first, Kokoro secondPiper was built around this class of deployment; Kokoro may still win if the device can handle the voice quality target.
Public voice assistant demoKokoroUsers notice prosody and naturalness more than the ops stack during a demo.
Voice cloning or multilingual style controlNeither as defaultCompare XTTS, F5-TTS, Chatterbox, and hosted APIs; Piper and Kokoro are not cloning-first choices.
§ 05 · Production bake-off

The minimum benchmark before a real decision.

CheckRun this
Prompt setUse the same 30 hard-text prompts plus 30 conversational turns and 10 long-form paragraphs.
Objective fidelityScore WER, CER, critical entity accuracy, URL/date/number failures, omission and repetition rate.
LatencyMeasure cold start, p50/p95 first audio, real-time factor, peak RSS, and CPU load on the real target box.
PreferenceRun blind A/B votes with volume matched, randomized order, same text, and at least 30 judgments per pair.
Production fitCheck license, voice availability, install size, monitoring path, streamability, and whether the voice is tolerable after repeated listening.
§ 06 · Sources

Primary references used for non-CodeSOTA facts.

SourceUsed forLink
Kokoro model card82M model, Apache-2.0, Hugging Face filesOpen
Kokoro inference libraryOpen-weight 82M TTS model and install pathOpen
Piper engineCurrent Open Home Foundation fork, local engine and APIsOpen
Piper voicesMIT voice repository covering 35 languagesOpen