DreamFlow: High-quality text-to-3D generation by Approximating Probability Flow

Author: VRAMrod
Published: 3/25/2024, 10:20:49 PM
Category: Research

Optimization for text-to-3D generation that leverages a predetermined timestep schedule to improve efficiency and quality compared to existing methods.

Paper

https://arxiv.org/abs/2403.14966

Project

https://kyungmnlee.github.io/dreamflow.github.io/

DreamFlow is a text-to-3D generation framework that optimizes 3D representations for high-quality and high-resolution 3D content creation. DreamFlow builds upon the optimization strategy of approximating the probability flow ODE of diffusion generative models tailored for 3D scene optimization. The framework consists of three stages: training NeRF from scratch, fine-tuning the extracted 3D mesh, and refining the mesh using high-resolution diffusion priors.

In the first stage, NeRF is optimized using a multi-resolution hash grid encoder with MLPs to predict RGB colors and densities. The latent diffusion model is used as a diffusion prior, and training involves rendering images and using APFO with a decreasing initial timestep. The second stage involves converting the neural field into a Signed Distance Field (SDF) and disentangling geometry and texture by optimizing them sequentially. The third stage refines the 3D mesh with high-resolution diffusion priors to enhance aesthetic quality.

DreamFlow leverages an optimization algorithm that approximates the probability flow ODE of diffusion generative models, enabling efficient and scalable 3D scene optimization. By following a coarse-to-fine approach, the framework generates photorealistic 3D content with detailed textures and shapes. The method outperforms existing approaches in terms of quality and speed, demonstrating superior results in human preference studies and quantitative comparisons. Through a series of experiments and user studies, DreamFlow showcases its effectiveness in generating high-quality 3D content from text prompts.