OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation

Author: VRAMrod
Published: 5/14/2024, 1:51:14 AM
Category: Research

Create editable 3D models and semantic continuous time-unlimited 3D videos from a single image using Gaussian Splatting and Stable Diffusion models.

arxiv.org

https://arxiv.org/abs/2405.06547

The OneTo3D method aims to generate editable dynamic 3D models and videos from a single image. The approach involves three main phases: generating an initial 3D model from the input image, creating and binding a self-adapting armature to the model, and interpreting text commands to control the model's motions and actions. The initial 3D model is obtained from the input image, focusing on the body pose and shape descriptions. Background removal techniques are employed to isolate the main object in the image for further processing.

To enable precise control over the 3D model's motions and actions, a self-adapting armature is generated and bound to the model. This armature is designed to facilitate detailed and dynamic editing of the model's poses. The text-to-motion and action interpretation mechanism allows users to input commands that dictate specific movements and actions for the 3D model. An interpreter function processes these commands, extracting action details and mapping them to the corresponding body parts of the model for execution.

The implementation of OneTo3D leverages traditional explicit methods for controlling and editing 3D models, utilizing the Blender framework for 3D editing. Key-points detection tools are employed to inform the armature generation algorithm, ensuring accurate and adaptable armature structures. The interpreter mechanism analyzes user input commands, extracting action details and translating them into specific motions and actions for the 3D model. Overall, OneTo3D offers a novel approach to generating editable dynamic 3D models and videos from single images, with a focus on precise control and seamless motion execution.