Voice Cloning
Replicating a speaker's voice characteristics.
1
Datasets
0
Results
wer
Canonical metric
Canonical Benchmark
LibriTTS test-clean (Zero-Shot TTS)
Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.
Primary metric: wer
Top 10
Leading models on LibriTTS test-clean (Zero-Shot TTS).
No results yet. Be the first to contribute.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Speech.