GVGEN: Text-to-3D Generation with Volumetric Representation
Efficiently generates 3D Gaussian representations from text input
The paper introduces a novel diffusion-based framework called GVGEN, designed to efficiently generate 3D Gaussian representations from text input. The framework consists of two main stages: GaussianVolume fitting and text-to-3D generation. In the GaussianVolume fitting stage, the authors introduce GaussianVolume, a structured volumetric form composed of 3D Gaussians. They propose a unique pruning and densifying method named the Candidate Pool Strategy to enhance detail fidelity through selective optimization. This approach allows for fitting high-quality volumetric representation of Gaussians, making the generation process more conducive for a diffusion-based framework. The authors also propose a coarse-to-fine generation pipeline to simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry.
In the text-to-3D generation stage, the authors partition the process into two steps: coarse geometry generation and Gaussian attributes prediction. They employ a diffusion model to generate the coarse geometry of objects, termed the Gaussian Distance Field (GDF), which outlines the proximity of each grid point to the nearest Gaussian point’s center. Following this, the generated GDF, in conjunction with text inputs, is processed through a 3D U-Net-based model to predict the attributes of GaussianVolumes, ensuring enhanced control and model convergence. The authors emphasize that this is the first study to directly feed-forward generate 3D Gaussians from texts, exploring new avenues for rapid 3D content creation and applications.
Comments
None