NaturalSpeech 3.

Microsoft ResearchproprietaryUnknown paramsDiscrete diffusion + codec LM (factored)

NaturalSpeech 3. Disentangled codec with factorized diffusion. Zero-shot TTS. 2024.

§ 02 · Benchmarks

Every benchmark NaturalSpeech 3 has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	VCTK	Audio · Text-to-speech	mos	4.4%	#1/6	2024-03-05	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 03 · Strengths by area

Where NaturalSpeech 3 actually performs.

Audio

benchmark

avg rank #1.0

§ 04 · Papers

1 paper with results for NaturalSpeech 3.

2024-03-05· Speech· 1 result
NaturalSpeech 3: Zero-Shot Copier-Free TTS with Flow Matching Codec Language Model

§ 05 · Related models

Other Microsoft Research models scored on Codesota.

Faster R-CNN

Unknown params · 7 results

Swin-L (Cascade R-CNN)

1 result

DiT-L (Cascade R-CNN)

Unknown params · 0 results

Faster R-CNN (VGG-16)

~137M params · 0 results

LayoutLMv3-Large

Unknown params · 0 results

NaturalSpeech

N/A params · 0 results

SwinV2-G

0 results

ViT-Adapter-L (BEiT-3)

Unknown params · 0 results

§ 06 · Sources & freshness

Where these numbers come from.

arxiv

result

1 of 1 rows marked verified.

NaturalSpeech 3.

Every benchmark NaturalSpeech 3 has a recorded score for.

Where NaturalSpeech 3 actually performs.

1 paper with results for NaturalSpeech 3.

NaturalSpeech 3: Zero-Shot Copier-Free TTS with Flow Matching Codec Language Model

Other Microsoft Research models scored on Codesota.

Where these numbers come from.