Voice Cloning

Replicating a speaker's voice characteristics.

1
Datasets
0
Results
wer
Canonical metric
Canonical Benchmark

LibriTTS test-clean (Zero-Shot TTS)

Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.

Primary metric: wer
View full leaderboard

Top 10

Leading models on LibriTTS test-clean (Zero-Shot TTS).

No results yet. Be the first to contribute.

All datasets

1 dataset tracked for this task.

Related tasks

Other tasks in Speech.