Voice Cloning
Replicating a speaker's voice characteristics.
LibriTTS test-clean (Zero-Shot TTS)
Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.
Top 10
Leading models on LibriTTS test-clean (Zero-Shot TTS).
No results yet. Be the first to contribute.
What were you looking for on Voice Cloning?
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Speech.
Didn't find what you came for?
Still looking for something on Voice Cloning? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.