Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
3D object synthesis that ensures 3D consistency, visual quality, and efficiency through a training-free 3D Adapter and ancestral sampling.
MVEdit is a generic method for adapting 2D diffusion models into 3D diffusion pipelines. This approach addresses the challenge of achieving 3D-consistent multi-view ancestral sampling while generating sharp details. The key innovation is the development of a training-free 3D Adapter, leveraging off-the-shelf ControlNets and a robust NeRF/mesh optimization scheme. This allows for precise 3D consistency when adapting pretrained image diffusion models into multi-view generators. Also introduced is StableSSDNeRF, a domain-specific 3D initialization method that can follow never-seen prompts and generate diverse, high-quality, photorealistic 3D objects within a short time frame.
The MVEdit approach involves a ControlNet-based 3D Adapter, which plays a crucial role in producing crisp textures and enhancing geometric details. The 3D Adapter design demonstrates effectiveness and the versatility in the associated pipelines, showcasing state-of-the-art performance in both image-to-3D and texture generation tasks. The method is also capable of handling text-guided 3D-to-3D and instruct 3D-to-3D pipelines, resulting in prompt-accurate appearances, intricate textures, and detailed geometry, highlighting the versatility of the approach.
Comments
None