13,100 short audio clips of a single speaker reading passages from non-fiction books. Standard benchmark for single-speaker TTS.
Mos is the reported evaluation metric for LJ Speech. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.
Higher is better
| Rank | Model | Trust | Score | Year | Links | Fix |
|---|---|---|---|---|---|---|
| 01 | VALL-E 2 | verified | 4.61 | 2026 | Source ↗ | Looks wrong? |
| 02 | NaturalSpeech | paper | 4.56 | 2026 | Source ↗ | Looks wrong? |
| 03 | StyleTTS2 | paper | 4.55 | 2026 | Source ↗ | Looks wrong? |
| 04 | StyleTTS 2 | verified | 4.55 | 2023 | Paper ↗ | Looks wrong? |
| 05 | VITS | verified | 4.43 | 2021 | Paper ↗ | Looks wrong? |
| 06 | Grad-TTS + HiFi-GAN | paper | 4.37 | 2026 | Source ↗ | Looks wrong? |
| 07 | Glow-TTS + HiFi-GAN | paper | 4.34 | 2026 | Source ↗ | Looks wrong? |
| 08 | FastSpeech2 + HiFi-GAN | paper | 4.32 | 2026 | Source ↗ | Looks wrong? |
| 09 | Voicebox | verified | 4.30 | 2026 | Source ↗ | Looks wrong? |
| 10 | XTTS v2 | verified | 4.21 | 2026 | Source ↗ | Looks wrong? |
| 11 | Matcha-TTS | paper | 3.84 | 2026 | Source ↗ | Looks wrong? |
| 12 | JETS | paper | 3.57 | 2026 | Source ↗ | Looks wrong? |