Voice Cloning

Replicating a speaker's voice characteristics.

1
Datasets
0
Results
wer
Canonical metric
Canonical Benchmark

LibriTTS test-clean (Zero-Shot TTS)

Standard zero-shot voice-cloning / TTS evaluation using LibriTTS test-clean speaker prompts. WER on resynthesized utterances (measured with a frozen ASR like HuBERT-Large or Whisper) is the primary intelligibility metric (lower=better); speaker similarity (SECS) is a secondary metric.

Primary metric: wer
View full leaderboard

Top 10

Leading models on LibriTTS test-clean (Zero-Shot TTS).

No results yet. Be the first to contribute.

What were you looking for on Voice Cloning?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

All datasets

1 dataset tracked for this task.

Related tasks

Other tasks in Speech.

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Voice Cloning? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.