Black Forest Labs' fastest model combining text-to-image generation with multi-reference editing in a single 4B parameter architecture. Sub-second inference with quality that rivals models 10x its size. Powers LoRA training (Step 0), initial image generation (Step 1), and text-guided multi-view synthesis (Step 2).
Generates 6 geometrically consistent views from a single reference image using learned 3D priors. Unlike FLUX's text-guided approach, Zero123++ produces views with strict camera angle consistency, making it better suited for 3D reconstruction. Includes automatic background removal via rembg.
Tencent's large-scale 3D generation system. Converts multi-view images into detailed 3D meshes using a flow-based diffusion transformer for shape generation. Configurable octree resolution and guidance scale for balancing quality vs. speed. Outputs GLB format with PBR-ready geometry.
Prepares 3D models for physical fabrication. Scales mesh to target shoe size (US 8-13), flattens the sole for proper contact surface, fills holes, repairs normals, and exports watertight STL for 3D printing. Runs on CPU-only infrastructure.
All models run on-demand on NVIDIA GPUs with automatic scaling. Pre-baked container images with cached weights for fast cold starts. Per-second billing with no idle charges. 6 active endpoints across the pipeline.
Browser-based 3D rendering with orbit controls, studio lighting (key + fill + rim), and PBR material support. Loads GLB models directly for interactive inspection. Used in both Step 3 (reconstruction preview) and Step 4 (print-ready preview).
Static frontend served from GitHub Pages. Zero-config deployment via git push. All compute happens server-side on Modal; the frontend is pure HTML/CSS/JS with no build step.