Codesota · Benchmark · VCTKHome/Leaderboards/Audio & Speech/Text-to-Speech/VCTK
Unknown

VCTK.

Speech data from 110 English speakers with various accents. Used for multi-speaker TTS.

Paper Leaderboard Lineage
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

mos

Mos is the reported evaluation metric for VCTK. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for mosverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01NaturalSpeech 3
MOS (1–5). Zero-shot VCTK evaluation. Source: Table 3, arxiv:2403.03100 (2024)
verified4.362024Paper ↗Looks wrong?
02Ground Truth (VCTK)
Human recordings from VCTK test set. Reported in YourTTS (Casanova et al., ICML 2022), Table 1.
verified4.262022Source ↗Looks wrong?
03VITS
MOS (1–5). VITS multispeaker on VCTK. Source: Table 2, arxiv:2106.06103 (ICML 2021)
verified4.212026Source ↗Looks wrong?
04StyleTTS 2
MOS (1–5). StyleTTS 2 multispeaker on VCTK. Source: Table 3, arxiv:2306.07279 (NeurIPS 2023)
verified4.192023Paper ↗Looks wrong?
05StyleTTS2
MOS (1–5). StyleTTS 2 multispeaker on VCTK. Source: Table 3, arxiv:2306.07279 (NeurIPS 2023)
verified4.192023Source ↗Looks wrong?
06VALL-E 2
MOS (1–5). Zero-shot multi-speaker on VCTK. Source: Table 1, arxiv:2406.05370 (Jun 2024)
verified4.182024Paper ↗Looks wrong?
07XTTS v2
MOS (1–5). XTTS v2 zero-shot on VCTK speakers. Source: arxiv:2304.01196
verified4.142023Paper ↗Looks wrong?
08YourTTS
MOS (1–5). YourTTS zero-shot on VCTK. Source: Table 2, arxiv:2202.04053 (ICML 2022)
verified4.072022Source ↗Looks wrong?
09SC-GlowTTS
Multi-speaker GlowTTS baseline. Reported in YourTTS (Casanova et al., ICML 2022), Table 1.
verified3.782022Source ↗Looks wrong?

sim-score

Sim Score is the reported evaluation metric for VCTK. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for sim-scoreverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01Ground Truth (VCTK)
Sim-MOS for human recordings, VCTK test set. Reported in YourTTS (Casanova et al., ICML 2022), Table 1.
verified4.192022Source ↗Looks wrong?
02YourTTS
Sim-MOS on VCTK test set (Exp 1 monolingual) ±0.05. Casanova et al., ICML 2022.
verified4.162022Source ↗Looks wrong?
03SC-GlowTTS
Sim-MOS on VCTK test set. Reported in YourTTS (Casanova et al., ICML 2022), Table 1.
verified3.992022Source ↗Looks wrong?
04VITS2
Speaker similarity MOS on VCTK multi-speaker test set ±0.08. Kong et al., Interspeech 2023.
verified3.992023Source ↗Looks wrong?
05VITS
Speaker similarity MOS on VCTK multi-speaker test set ±0.09. Kong et al., Interspeech 2023 (VITS2 paper, Table 2b).
verified3.792023Source ↗Looks wrong?
Lineage

VCTK in context.

See full text-to-speech benchmarks lineage →
This benchmark (1)
active2019-11
VCTK
Successors (1)
active2024-06
Seed-TTS-Eval
Model quality improved enough that basic corpora no longer exposed enough failure modes; harder text, similarity, and robustness became more important.
§ 04 · Submit a result

Add to the leaderboard.

← Back to Text-to-Speech