Text→3D Model

Text to 3D

Generate 3D models from text descriptions. Enables rapid prototyping and creative 3D content generation.

How Text-to-3D Works

A technical deep-dive into text-to-3D generation. From Score Distillation Sampling to feed-forward models and multi-view reconstruction.

1. Approaches 2. Models 3. Score Distillation 4. Output Formats 5. Code

Text-to-3D Approaches

Four main paradigms for generating 3D content from text descriptions.

Score Distillation

Optimize 3D representation using 2D diffusion guidance

DreamFusion, Magic3D, ProlificDreamer

Pros: High quality, Flexible prompts

Cons: Slow (hours), Multi-face Janus problem

Feed-Forward

Direct prediction from text embeddings

Point-E, Shap-E, OpenLRM

Pros: Fast (seconds), Consistent

Cons: Lower detail, Limited diversity

Native 3D Diffusion

Diffusion directly in 3D space

3DGen, Direct3D, LAS-Diffusion

Pros: Native 3D consistency, Good geometry

Cons: Limited training data, Emerging field

Multi-view + Reconstruction

Generate views, then reconstruct 3D

MVDream, Zero123++, Instant3D

Pros: Leverages 2D models, Good consistency

Cons: Two-stage process, View coverage

Speed vs Quality Tradeoff

Seconds

Point-E, Shap-E, LGM

Hours

DreamFusion, Magic3D

Model Evolution

From the first CLIP-guided methods to modern feed-forward generation.

CLIP-Forge

2022

Feed-ForwardEarly CLIP-guided generation

DreamFusion

2022

SDSScore Distillation Sampling breakthrough

Point-E

2022

Feed-ForwardOpenAI, fast point cloud generation

Shap-E

2023

Feed-ForwardOpenAI, implicit neural representations

Magic3D

2023

SDSCoarse-to-fine, higher resolution

ProlificDreamer

2023

VSDVariational Score Distillation, better quality

Zero123++

2023

Multi-viewConsistent multi-view generation

MVDream

2024

Multi-viewMulti-view diffusion model

Instant3D

2024

Feed-ForwardFast sparse-view reconstruction

LGM

2024

Feed-ForwardGaussian splatting in seconds

Fastest

LGM / Instant3D

Seconds to 3D Gaussians

Highest Quality

ProlificDreamer

VSD for better details

Best Balance

MVDream + LGM

Quality + speed combo

Score Distillation Sampling (SDS)

The key insight of DreamFusion: use a 2D diffusion model to optimize a 3D representation.

How SDS Works

NeRF

3D Model

Render

2D Image

Score

Grad

Update 3D

Score Distillation

Ask the diffusion model: "what noise would you add to make this image match the prompt?" Use that signal to update the 3D model.

Janus Problem

2D models don't understand 3D consistency. Result: objects with multiple faces (e.g., face on front AND back). Solutions: multi-view models, view-dependent prompts.

Variational Score Distillation

ProlificDreamer's improvement: model the distribution of possible 3D outputs, not just a single mode. Better quality and diversity.

Classifier-Free Guidance

Scale of text influence. Higher values (50-100) for SDS vs normal diffusion (7-15). Too high = oversaturation.

3D Output Formats

Different 3D representations for different use cases.

Point Cloud

Collection of 3D points

Quick visualization, processing

Mesh

Vertices + faces (triangles)

Games, CAD, 3D printing

NeRF

Neural radiance field

Novel view synthesis

3D Gaussians

Gaussian splat representation

Real-time rendering

SDF

Signed distance function

Boolean operations, CAD

Code Examples

Get started with text-to-3D generation in Python.

Shap-E (OpenAI)Fast, outputs mesh

Feed-Forward

import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import decode_latent_mesh

# Load models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

# Generate from text
prompt = "a shark"
batch_size = 1

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=15.0,
    model_kwargs=dict(texts=[prompt] * batch_size),
    progress=True,
    clip_denoised=True,
)

# Decode to mesh
for i, latent in enumerate(latents):
    mesh = decode_latent_mesh(xm, latent).tri_mesh()
    mesh.write_ply(f'output_{i}.ply')

Quick Reference

For Speed

- Shap-E (seconds)
- LGM (seconds)
- Instant3D (fast)

For Quality

- ProlificDreamer
- Magic3D
- MVDream + LGM

For Production

- Meshy API
- Rodin (Hyper3D)
- Luma Genie

Use Cases

✓Game asset creation
✓3D printing prototypes
✓Virtual world building
✓Product design iteration

Architectural Patterns

Score Distillation Sampling

Use 2D diffusion models to guide 3D generation.

Pros:

+Leverages powerful 2D models
+Creative outputs

Cons:

-Slow optimization
-Multi-face problems

Feed-Forward 3D Generation

Direct text-to-3D in one pass.

Pros:

+Fast generation
+Consistent quality

Cons:

-Needs 3D training data
-Limited variety

Multi-View then Reconstruct

Generate multi-view images, then reconstruct 3D.

Pros:

+Leverages image generation
+High quality

Cons:

-Multi-step pipeline
-View consistency

Implementations

API Services

Meshy

API

Production text-to-3D API. Various styles.

Luma AI Genie

Luma AI

API

High-quality text-to-3D. Mobile app + API.

Open Source

Shap-E

MIT

Open Source

OpenAI's text-to-3D. Fast generation, moderate quality.

GitHub

Point-E

MIT

Open Source

Text to point cloud to mesh. Quick results.

GitHub

MVDream

Apache 2.0

Open Source

Multi-view diffusion for 3D. Consistent views.

GitHub

Benchmarks

T3Bench →

Quick Facts

Input: Text
Output: 3D Model
Implementations: 3 open source, 2 API
Patterns: 3 approaches

Related Blocks

Have benchmark data?

Help us track the state of the art for text to 3d.

Submit Results