Model card
NaturalSpeech 3.
Microsoft ResearchproprietaryUnknown paramsDiscrete diffusion + codec LM (factored)
NaturalSpeech 3. Disentangled codec with factorized diffusion. Zero-shot TTS. 2024.
§ 01 · Benchmarks
Every benchmark NaturalSpeech 3 has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | VCTK | Speech · Text-to-Speech | mos | 4.4% | #1 | 2024-03-05 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Papers
1 paper with results for NaturalSpeech 3.
- 2024-03-05· Speech· 1 result
NaturalSpeech 3: Zero-Shot Copier-Free TTS with Flow Matching Codec Language Model
§ 04 · Related models
Other Microsoft Research models scored on Codesota.
Faster R-CNN
Unknown params · 7 results
Swin-L (Cascade R-CNN)
1 result
DiT-L (Cascade R-CNN)
Unknown params · 0 results
Faster R-CNN (VGG-16)
~137M params · 0 results
LayoutLMv3-Large
Unknown params · 0 results
NaturalSpeech
N/A params · 0 results
SwinV2-G
0 results
ViT-Adapter-L (BEiT-3)
Unknown params · 0 results
§ 05 · Sources & freshness
Where these numbers come from.
arxiv
1
result
1 of 1 rows marked verified.