Unknown
Speech data from 110 English speakers with various accents. Used for multi-speaker TTS.
mos
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | NaturalSpeech 3 MOS (1–5). Zero-shot VCTK evaluation. Source: Table 3, arxiv:2403.03100 (2024) | Community | 4.36 | 2026 | Source |
| 2 | NaturalSpeech 3 MOS (1–5). Zero-shot VCTK evaluation. Source: Table 3, arxiv:2403.03100 (2024) | Community | 4.36 | 2026 | Source |
| 3 | Ground Truth (VCTK) Human recordings from VCTK test set. Reported in YourTTS (Casanova et al., ICML 2022), Table 1. | Community | 4.26 | 2022 | Source |
| 4 | VITS MOS (1–5). VITS multispeaker on VCTK. Source: Table 2, arxiv:2106.06103 (ICML 2021) | Community | 4.21 | 2026 | Source |
| 5 | StyleTTS2 MOS (1–5). StyleTTS 2 multispeaker on VCTK. Source: Table 3, arxiv:2306.07279 (NeurIPS 2023) | Community | 4.19 | 2023 | Source |
| 6 | VALL-E 2 MOS (1–5). Zero-shot multi-speaker on VCTK. Source: Table 1, arxiv:2406.05370 (Jun 2024) | Community | 4.18 | 2026 | Source |
| 7 | XTTS v2 MOS (1–5). XTTS v2 zero-shot on VCTK speakers. Source: arxiv:2304.01196 | Community | 4.14 | 2026 | Source |
| 8 | YourTTS MOS (1–5). YourTTS zero-shot on VCTK. Source: Table 2, arxiv:2202.04053 (ICML 2022) | Community | 4.07 | 2022 | Source |
| 9 | SC-GlowTTS Multi-speaker GlowTTS baseline. Reported in YourTTS (Casanova et al., ICML 2022), Table 1. | Community | 3.78 | 2022 | Source |
sim-score
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | Ground Truth (VCTK) Sim-MOS for human recordings, VCTK test set. Reported in YourTTS (Casanova et al., ICML 2022), Table 1. | Community | 4.19 | 2022 | Source |
| 2 | YourTTS Sim-MOS on VCTK test set (Exp 1 monolingual) ±0.05. Casanova et al., ICML 2022. | Community | 4.16 | 2022 | Source |
| 3 | VITS2 Speaker similarity MOS on VCTK multi-speaker test set ±0.08. Kong et al., Interspeech 2023. | Community | 3.99 | 2023 | Source |
| 4 | SC-GlowTTS Sim-MOS on VCTK test set. Reported in YourTTS (Casanova et al., ICML 2022), Table 1. | Community | 3.99 | 2022 | Source |
| 5 | VITS Speaker similarity MOS on VCTK multi-speaker test set ±0.09. Kong et al., Interspeech 2023 (VITS2 paper, Table 2b). | Community | 3.79 | 2023 | Source |