Codesota · Benchmark · LibriTTS test-clean (Zero-Shot TTS)Home/Leaderboards/Audio & Speech/Voice Cloning/LibriTTS test-clean (Zero-Shot TTS)
Unknown

LibriTTS test-clean (Zero-Shot TTS).

Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Wer

Wer is the reported evaluation metric for LibriTTS test-clean (Zero-Shot TTS). Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Lower is better

Trust tiers for Werverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01NaturalSpeech 3
NaturalSpeech 3, LibriSpeech/LibriTTS test-clean zero-shot, WER. seed — verify.
paper1.812026Source ↗Looks wrong?
02Voicebox
Voicebox zero-shot TTS, LibriSpeech/LibriTTS test-clean WER. seed — verify.
paper1.902026Source ↗Looks wrong?
03VALL-E
VALL-E zero-shot TTS, LibriTTS-style test prompts, WER via ASR. seed — verify (original VALL-E evaluated on LibriSpeech).
paper5.902026Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Voice Cloning