3D Generation6 min read

TRELLIS.2: Production-Ready 3D Assets in 3 Seconds

Microsoft Research releases a 4B parameter image-to-3D model that generates game-ready PBR assets from single images. The novel O-Voxel representation enables resolutions up to 1536^3 with full transparency and arbitrary topology support.

Key Takeaways

  • -4B parameters (2x TRELLIS v1), MIT license for commercial use
  • -512^3 resolution in 3 seconds, 1024^3 in 17 seconds on H100
  • -Full PBR output: base color, roughness, metallic, opacity/transparency
  • -User studies: 4.55/5 satisfaction, 4.82/5 intent alignment

What is TRELLIS.2?

TRELLIS.2 is Microsoft Research's latest image-to-3D generation model, a direct continuation of their CVPR'25 spotlight work. The model takes a single image as input and generates complete 3D assets with physically-based rendering (PBR) materials, suitable for immediate use in game engines and production pipelines.

Unlike previous approaches that struggle with complex topology, TRELLIS.2 handles open surfaces, non-manifold geometry, and internal structures. This makes it the first truly production-ready open-source image-to-3D model for game development and digital asset creation.

Technical Architecture: O-Voxel Representation

The core innovation in TRELLIS.2 is the O-Voxel (Occupancy Voxel) representation, a field-free sparse voxel structure that fundamentally changes how 3D geometry is represented during generation. Traditional approaches use implicit neural fields (NeRF, SDF) which require expensive per-point queries during mesh extraction.

How O-Voxel Works

O-Voxel represents 3D shapes as a sparse set of occupied voxels, each storing:

  • Occupancy probability: Whether the voxel contains surface geometry
  • Surface normal: Orientation of the local surface
  • PBR attributes: Base color, roughness, metallic, and opacity values

This sparse representation enables scaling to 1536^3 resolution, approximately 3.6 billion potential voxels, while only storing and processing the occupied subset. The sparsity pattern itself encodes the shape, eliminating the need for expensive field evaluations.

Sparse Compression VAE

TRELLIS.2 introduces a Sparse Compression Variational Autoencoder (SC-VAE) with 16x spatial downsampling. This compresses the O-Voxel representation into a compact latent space where the diffusion process operates. At 1024^3 resolution, this produces approximately 9,600 tokens, making generation tractable on current hardware.

Rectified Flow Transformer Backbone

The generation backbone uses Rectified Flow Transformers, a recent advancement in diffusion model architectures that provides straighter sampling trajectories. This enables high-quality generation in fewer sampling steps compared to traditional DDPM or DDIM schedulers.

Performance Benchmarks

TRELLIS.2 achieves remarkable generation speeds across different resolution targets:

ResolutionGeneration TimeToken CountUse Case
512^33 seconds~1.2KReal-time preview, prototyping
1024^317 seconds~9.6KProduction assets, game development
1536^360 seconds~32KHigh-detail cinematics, VFX

All benchmarks measured on NVIDIA H100 GPU. Consumer hardware (RTX 4090) achieves approximately 2-3x longer generation times.

User Study Results

Microsoft conducted extensive user studies comparing TRELLIS.2 against competing methods:

  • Overall satisfaction: 4.55/5.00
  • Intent alignment: 4.82/5.00 (how well the output matches user expectations)
  • Geometry quality: 4.61/5.00
  • Material quality: 4.43/5.00

Comparison with Competitors

The image-to-3D space has seen rapid development, with several notable models competing for production adoption:

ModelParametersPBR SupportTransparencyLicense
TRELLIS.24BFullYesMIT
Hunyuan3D-22BFullLimitedTencent
Point-E (OpenAI)1BNoNoMIT
Shap-E (OpenAI)300MBasicNoMIT
Wonder3D~1BPartialNoCC-BY-NC

Key Differentiators

TRELLIS.2 distinguishes itself from competitors in several critical areas:

  • Arbitrary topology: Unlike SDF-based methods, O-Voxel handles non-watertight meshes, open surfaces, and complex internal structures
  • Full transparency: First open model to properly generate translucent and transparent materials (glass, liquids, ice)
  • Production formats: Direct export to GLB, OBJ, STL, GLTF, USDZ, and PLY without post-processing
  • MIT license: Full commercial use permitted, unlike many alternatives with NC restrictions

Output Formats and PBR Pipeline

TRELLIS.2 generates complete PBR-ready assets with the following material channels:

  • Base Color (Albedo): RGB diffuse color without lighting information
  • Roughness: Surface smoothness from mirror (0.0) to fully rough (1.0)
  • Metallic: Metallic/dielectric blend factor
  • Opacity: Full alpha channel support for transparency

Supported export formats include GLB (web/games), OBJ (legacy pipelines), STL (3D printing), GLTF (web), USDZ (Apple ecosystem), and PLY (point clouds/research).

Use Cases for Game Developers

TRELLIS.2 addresses several pain points in game asset creation workflows:

Rapid Prototyping

At 3 seconds per asset (512^3), designers can iterate on visual concepts in real-time. Generate dozens of variations from concept art sketches before committing to manual modeling.

Background Asset Generation

For open-world games requiring thousands of environmental props, TRELLIS.2 can generate variety at scale. Generate rocks, debris, vegetation, and architectural details from reference images.

Reference-Based Modeling

Use TRELLIS.2 output as a starting point for manual refinement. Generate a base mesh from concept art, then refine topology and add detail in traditional DCC tools.

Indie Game Development

Small teams without dedicated 3D artists can generate production-quality assets directly from concept images or photographs. The MIT license permits commercial game releases.

Recommendations

When to Use TRELLIS.2

  • +Game asset production requiring PBR materials and transparency
  • +Rapid prototyping where 3-second generation enables real-time iteration
  • +Commercial projects requiring permissive MIT licensing
  • +Assets with complex topology: glass objects, foliage, mechanical parts

When to Consider Alternatives

  • -Hero assets requiring hand-crafted topology and edge flow
  • -Rigged characters needing animation-ready topology
  • -Extremely high-poly sculpts (consider ZBrush pipelines)
  • -Projects without H100/A100 access (consumer GPUs are slower)

Hardware Requirements

TRELLIS.2 requires significant GPU memory for high-resolution generation:

  • 512^3 resolution: 16GB VRAM minimum (RTX 4080, A4000)
  • 1024^3 resolution: 40GB VRAM recommended (A100, A6000)
  • 1536^3 resolution: 80GB VRAM (H100, A100 80GB)

Cloud deployment on GPU instances is recommended for teams without local hardware. RunPod, Lambda Labs, and major cloud providers offer H100 instances at reasonable rates for batch processing.

Conclusion

TRELLIS.2 represents a significant advancement in AI-driven 3D asset generation. The combination of novel O-Voxel representation, production-ready PBR output, full transparency support, and MIT licensing makes it the current best choice for commercial game development and digital asset creation.

For teams evaluating image-to-3D solutions, TRELLIS.2 should be the primary candidate for production pipelines requiring quality, speed, and legal clarity. The 3-second generation time at 512^3 enables workflows that were previously impractical with slower alternatives.