Disentangled 3D Scene Generation with Layout Learning

Disentangled 3D Scene Generation with Layout Learning

Unsupervised method utilizing a pretrained text-to-image model to disentangle 3D scenes into individual objects, optimizing Neural Radiance Fields (NeRFs) for each object, and composing scenes with layouts

The paper introduces a method for generating 3D scenes that are disentangled into their component objects. This disentanglement is achieved through an unsupervised approach that leverages a large pretrained text-to-image model. The key insight is to discover objects by identifying parts of a 3D scene that, when spatially rearranged, still form valid configurations of the same scene. The method optimizes multiple Neural Radiance Fields (NeRFs) from scratch, with each NeRF representing a distinct object, along with a set of layouts that composite these objects into scenes. The composited scenes are then encouraged to be in-distribution according to the image generator. Despite its simplicity, the approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.


The method's effectiveness is demonstrated through the generation of 3D scenes that are disentangled into individual objects. This enables the creation of new capabilities in text-to-3D content generation. The paper also presents an interactive demo and results to showcase the practical application of the proposed approach. The method's simplicity and success in decomposing 3D scenes into individual objects highlight its potential for advancing text-to-3D content creation.


Comments

  1. None