BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Author: VRAMrod
Published: 4/30/2024, 11:14:24 PM
Category: Research

Uses VLMs to automate the design process in graphics applications like Blender, aiding users in generating complex sequences required for various design tasks.

arxiv.org

https://arxiv.org/abs/2404.17672

The BlenderAlchemy system leverages vision-language models to iteratively refine visual programs in the Blender 3D design environment. The system decomposes the initial Blender state into a base state and a set of initial programs that act on the base state to produce the initial environment. Each initial program focuses on a specific part of the 3D graphical design workflow, such as material editing or lighting setup. The system uses a dynamics function within Blender to transition from one state to another based on programmatic actions.

To refine individual visual programs, the system aims to discover edited versions of the initial programs that better align with user intentions. This involves exploring a program's "neighborhood" through tweak edits, which involve small numerical adjustments, and leap edits, which require more drastic changes. The optimal edits often involve a mix of tweak and leap edits, and the system cycles between restricting edits to the program's neighborhood and the entire program space.

BlenderAlchemy incorporates visual imagination by providing access to text-to-image models to guide program edits based on textual user intentions. The system uses prompts to guide the visual state evaluator and edit generator in refining programs. By synthesizing candidate materials and rendering them in the Blender design space, BlenderAlchemy enables users to create a variety of visual outputs aligned with their intentions.