Model card
VALL-E 2.
MicrosoftproprietaryUnknown paramsNeural codec language model (EnCodec tokens)
VALL-E 2. First system achieving human parity on LibriSpeech. Grouped code modeling + repetition aware sampling. Jun 2024.
§ 01 · Benchmarks
Every benchmark VALL-E 2 has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | LJ Speech | Speech · Text-to-Speech | mos | 4.6% | #1 | 2024-06-08 | source ↗ |
| 02 | VCTK | Speech · Text-to-Speech | mos | 4.2% | #4 | 2024-06-08 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Papers
1 paper with results for VALL-E 2.
- 2024-06-08· Speech· 2 results
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
§ 04 · Related models
Other Microsoft models scored on Codesota.
§ 05 · Sources & freshness
Where these numbers come from.
arxiv
2
results
2 of 2 rows marked verified.