Codesota · Models · NaturalSpeech 3Microsoft Research1 results · 1 benchmarks
Model card

NaturalSpeech 3.

Microsoft ResearchproprietaryUnknown paramsDiscrete diffusion + codec LM (factored)

NaturalSpeech 3. Disentangled codec with factorized diffusion. Zero-shot TTS. 2024.

§ 01 · Benchmarks

Every benchmark NaturalSpeech 3 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01VCTKSpeech · Text-to-Speechmos4.4%#1/62024-03-05source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where NaturalSpeech 3 actually performs.

Speech
1
benchmark
avg rank #1.0
§ 03 · Papers

1 paper with results for NaturalSpeech 3.

  1. 2024-03-05· Speech· 1 result

    NaturalSpeech 3: Zero-Shot Copier-Free TTS with Flow Matching Codec Language Model

§ 04 · Related models

Other Microsoft Research models scored on Codesota.

Faster R-CNN
Unknown params · 7 results
Swin-L (Cascade R-CNN)
1 result
DiT-L (Cascade R-CNN)
Unknown params · 0 results
Faster R-CNN (VGG-16)
~137M params · 0 results
LayoutLMv3-Large
Unknown params · 0 results
NaturalSpeech
N/A params · 0 results
SwinV2-G
0 results
ViT-Adapter-L (BEiT-3)
Unknown params · 0 results
§ 05 · Sources & freshness

Where these numbers come from.

arxiv
1
result
1 of 1 rows marked verified.