Text to 3D
Generate 3D models from text descriptions. Enables rapid prototyping and creative 3D content generation.
How Text-to-3D Works
A technical deep-dive into text-to-3D generation. From Score Distillation Sampling to feed-forward models and multi-view reconstruction.
Text-to-3D Approaches
Four main paradigms for generating 3D content from text descriptions.
Optimize 3D representation using 2D diffusion guidance
Direct prediction from text embeddings
Diffusion directly in 3D space
Generate views, then reconstruct 3D
Speed vs Quality Tradeoff
Model Evolution
From the first CLIP-guided methods to modern feed-forward generation.
Score Distillation Sampling (SDS)
The key insight of DreamFusion: use a 2D diffusion model to optimize a 3D representation.
How SDS Works
Ask the diffusion model: "what noise would you add to make this image match the prompt?" Use that signal to update the 3D model.
2D models don't understand 3D consistency. Result: objects with multiple faces (e.g., face on front AND back). Solutions: multi-view models, view-dependent prompts.
ProlificDreamer's improvement: model the distribution of possible 3D outputs, not just a single mode. Better quality and diversity.
Scale of text influence. Higher values (50-100) for SDS vs normal diffusion (7-15). Too high = oversaturation.
3D Output Formats
Different 3D representations for different use cases.
Collection of 3D points
Vertices + faces (triangles)
Neural radiance field
Gaussian splat representation
Signed distance function
Code Examples
Get started with text-to-3D generation in Python.
import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import decode_latent_mesh
# Load models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))
# Generate from text
prompt = "a shark"
batch_size = 1
latents = sample_latents(
batch_size=batch_size,
model=model,
diffusion=diffusion,
guidance_scale=15.0,
model_kwargs=dict(texts=[prompt] * batch_size),
progress=True,
clip_denoised=True,
)
# Decode to mesh
for i, latent in enumerate(latents):
mesh = decode_latent_mesh(xm, latent).tri_mesh()
mesh.write_ply(f'output_{i}.ply')Quick Reference
- - Shap-E (seconds)
- - LGM (seconds)
- - Instant3D (fast)
- - ProlificDreamer
- - Magic3D
- - MVDream + LGM
- - Meshy API
- - Rodin (Hyper3D)
- - Luma Genie
Use Cases
- ✓Game asset creation
- ✓3D printing prototypes
- ✓Virtual world building
- ✓Product design iteration
Architectural Patterns
Score Distillation Sampling
Use 2D diffusion models to guide 3D generation.
- +Leverages powerful 2D models
- +Creative outputs
- -Slow optimization
- -Multi-face problems
Feed-Forward 3D Generation
Direct text-to-3D in one pass.
- +Fast generation
- +Consistent quality
- -Needs 3D training data
- -Limited variety
Multi-View then Reconstruct
Generate multi-view images, then reconstruct 3D.
- +Leverages image generation
- +High quality
- -Multi-step pipeline
- -View consistency
Implementations
API Services
Meshy
MeshyProduction text-to-3D API. Various styles.
Luma AI Genie
Luma AIHigh-quality text-to-3D. Mobile app + API.
Benchmarks
Quick Facts
- Input
- Text
- Output
- 3D Model
- Implementations
- 3 open source, 2 API
- Patterns
- 3 approaches