Home/Browse/Speech/Text-to-Speech/LJ Speech

LJ Speech

Unknown

13,100 short audio clips of a single speaker reading passages from non-fiction books. Standard benchmark for single-speaker TTS.

Benchmark Stats

Models11
Papers11
Metrics1

SOTA History

Not enough data to show trend.

mos

mos

Higher is better

RankModelSourceScoreYearPaper
1VALL-E 2

MOS (1–5). Human parity: CMOS +0.17 above ground truth. Source: Table 1, arxiv:2406.05370 (Jun 2024)

Community4.612026Source
2NaturalSpeech

MOS 4.56 ±0.13 on LJSpeech. Human GT = 4.58 ±0.13; difference not statistically significant (p>0.05, Wilcoxon). First TTS system to achieve human-level quality on LJSpeech. IEEE TASLP 2024 (arXiv 2205.04421, Table 2).

Community4.562026Source
3StyleTTS2

MOS (1–5). Surpasses human baseline (4.44 MOS). Source: Table 2, arxiv:2306.07279 (NeurIPS 2023)

Community4.552026Source
4VITS

MOS (1–5). VITS end-to-end TTS. Source: Table 2, arxiv:2106.06103 (ICML 2021)

Community4.432026Source
5Grad-TTS + HiFi-GAN

MOS 4.37 ±0.13 on LJSpeech. From NaturalSpeech paper (arXiv 2205.04421, Table 4). Human GT = 4.58 in same evaluation.

Community4.372026Source
6Glow-TTS + HiFi-GAN

MOS 4.34 ±0.13 on LJSpeech. From NaturalSpeech paper (arXiv 2205.04421, Table 4). Human GT = 4.58 in same evaluation.

Community4.342026Source
7FastSpeech2 + HiFi-GAN

MOS 4.32 ±0.15 on LJSpeech. From NaturalSpeech paper (arXiv 2205.04421, Table 4). Human GT = 4.58 in same evaluation.

Community4.322026Source
8Voicebox

MOS (1–5). Voicebox single-speaker on LJ Speech. Source: Table 1, arxiv:2306.15687 (NeurIPS 2023)

Community4.32026Source
9XTTS v2

MOS (1–5). XTTS v2 evaluated on LJ Speech. Source: arxiv:2304.01196 evaluation

Community4.212026Source
10Matcha-TTS

MOS 3.84 ±0.08 on LJSpeech, 10 ODE solver steps (best variant). Vocoded reference = 4.13 in same evaluation. ICASSP 2024 (arXiv 2309.03199, Table 1). Flow-matching architecture; significantly outperforms Grad-TTS.

Community3.842026Source
11JETS

MOS 3.57 ±0.09 on LJSpeech (in-distribution). From StyleTTS2 paper (NeurIPS 2023, arXiv 2306.07691, Table 2). Human GT = 3.81 in same evaluation.

Community3.572026Source

Submit a Result