SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Author: VRAMrod
Published: 3/9/2024, 3:44:40 PM
Category: Research

Agent that converts textual descriptions into Blender code for creating 3D scenes, employing a dual-loop self-improving pipeline and leveraging LLMs

arxiv.org

https://arxiv.org/abs/2403.01248

SceneCraft is an innovative agent designed to streamline the process of converting textual descriptions into executable Blender code for rendering visually cohesive and contextually accurate 3D scenes. This task demands a nuanced understanding of spatial and semantic relationships, which remains a challenge for current large language models (LLMs). SceneCraft leverages a state-of-the-art multimodal LLM, GPT-4V, and professional rendering software, Blender, to achieve its goal.

The method employed by SceneCraft involves several key components. Firstly, it utilizes an LLM to generate a list of asset names and descriptions based on the input text query. These assets are retrieved from a large repository of 3D objects using a CLIP-based retriever. The scene is then decomposed into a set of sub-scenes, each containing a title, a list of asset names, and a sub-scene description. This decomposition guides the subsequent scene optimization.

SceneCraft's relies on a dual-loop self-improving pipeline for its functionality. In the outer loop, the agent plans a scene graph for each sub-scene and writes code and arguments for the scene. In the inner loop, a constraint-based search is performed, followed by rendering, critique, and revision. The agent stores function updates for each relation per question and updates the library with these functions. The library learning process involves reviewing gradual changes in the functions, detecting common patterns, and merging these changes into the global skill library.

Most remarkable about SceneCraft is its ability to autonomously generate Python code, translating spatial relations within scenes into precise numerical constraints. This approach goes beyond existing systems that depend on predefined templates and rules, as well as those that are restricted by available 3D scene data. SceneCraft's focus on learning general spatial planning skills from a small number of synthetic queries sets it apart from previous works in the domain of 3D scene generation.