GeoDiffuser: Geometry-Based Image Editing with Diffusion Models

Author: VRAMrod
Published: 4/23/2024, 5:00:58 AM
Category: Research

Zero-shot optimization-based method for performing both 2D and 3D image-based object editing operations by leveraging geometric transformations

Paper

https://arxiv.org/abs/2404.14403

Project

https://ivl.cs.brown.edu/research/geodiffuser.html

GeoDiffuser introduces a method for image editing using diffusion models. GeoDiffuser unifies various image-based object editing capabilities into a single approach by incorporating geometric transformations directly within the shared attention layers of diffusion models. This allows for realistic edits while preserving object style. The method can perform common 2D and 3D edits like object translation, 3D rotation, and removal without the need for additional training.

GeoDiffuser operates as a zero-shot optimization-based method that can support any diffusion model with attention layers. By formulating image editing as geometric transformations of parts of the image, common user-specified editing operations can be achieved. The approach leverages attention layers to accurately capture local and global image interactions, ensuring realistic edits with accurate lighting, shadows, and reflections while inpainting disoccluded image regions.

The general editing framework followed by GeoDiffuser involves performing an inversion on the image to obtain a noise latent, which serves as a starting point for generating and editing the image. The method utilizes shared attention between reference and edit diffusion processes to achieve desired edits. By incorporating geometric transforms directly within the attention layers, GeoDiffuser can handle a wide range of image editing operations, producing better results compared to previous methods.

Supported operations by GeoDiffuser include geometric edits to segmented foreground objects in natural or generated images. The method focuses on geometric transformations specified by users through sliders to control transformations of foreground objects. By applying the transformations directly within the shared attention layers of diffusion models, GeoDiffuser enables realistic edits while preserving object style.