Editable Image Elements for Controllable Synthesis
An image representation for facilitating spatial editing of input images using diffusion models.
Advancements in diffusion models have improved text-guided synthesis tasks. Yet, the challenge persists in editing user-provided images due to the unsuitability of diffusion models' high-dimensional noise input space for image inversion or spatial manipulation. this paper introduces a novel image representation facilitating spatial editing of input images within a diffusion model framework.
Central to this method is the encoding of input images into discrete "image elements" via a convolutional encoder. These elements, characterized by centroid and size parameters, closely mimic superpixels and allow intuitive user modifications such as resizing, rearranging, or removing. Upon user manipulation, a diffusion-based decoder synthesizes a realistic image, honoring the edits made to the image elements.
The method enables a wide range of editing tasks including object resizing, rearrangement, de-occlusion, removal, object variations, and composition. Object variations and removal are accomplished through inpainting guided by text prompts and remaining image elements. Additionally, the model facilitates image composition adjustments by seamlessly inserting elements from another image and eliminating original elements in overlapping regions.


Comments
None