Voice Cloning2019en
LibriTTS test-clean zero-shot TTS evaluation
Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.
Current State of the Art
NaturalSpeech 3
Microsoft
1.81
wer
LibriTTS test-clean (Zero-Shot TTS) — wer
3 results · 1 SOTA advances · lower is better
All results
SOTA frontier
wer Progress Over Time
Showing 3 breakthroughs from Jan 2023 to Mar 2024
Key Milestones
Jan 2023
VALL-E
VALL-E zero-shot TTS, LibriTTS-style test prompts, WER via ASR. seed — verify (original VALL-E evaluated on LibriSpeech).
5.9
Jun 2023
Voicebox
Voicebox zero-shot TTS, LibriSpeech/LibriTTS test-clean WER. seed — verify.
1.9
-67.8%
Mar 2024
NaturalSpeech 3Current SOTA
NaturalSpeech 3, LibriSpeech/LibriTTS test-clean zero-shot, WER. seed — verify.
1.8
-4.7%
Total Improvement
69.3%
Time Span
1y 2m
Breakthroughs
3
Current SOTA
1.8
Top Models Performance Comparison
Top 3 models ranked by wer (lower is better)
Best Score
1.8
Top Model
NaturalSpeech 3
Models Compared
3
Score Range
4.1