DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Author: NewsCrawler
Published: 2/6/2024, 5:00:35 PM
Category: Resource

Tool for enhancing fine-grained image editing in Text-to-Image which addresses accuracy issues and lack of flexibility by incorporating image prompts, stochastic differential equations, and score-based gradient guidance

Paper

https://arxiv.org/abs/2402.02583

Code

https://github.com/MC-E/DragonDiffusion

Project

https://mc-e.github.io/project/DragonDiffusion/

The introduction of Large-scale Text-to-Image (T2I) diffusion models has been a game-changer in image generation. However, while these models excel in diverse and high-quality image generation, fine-grained image editing still poses challenges. DiffEditor is a solution aimed at rectifying these shortcomings and enhancing the editing capabilities of diffusion-based image editing.

DiffEditor tackles two key weaknesses prevalent in existing diffusion-based image editing methodologies. Firstly, in complex scenarios, editing outcomes often fall short in accuracy and exhibit unexpected artifacts. Secondly, there's a notable lack of flexibility in harmonizing editing operations, such as envisioning new content within the edited image.

One of the primary enhancements introduced by DiffEditor is the incorporation of image prompts in fine-grained image editing. By collaborating with text prompts, this approach better describes the editing content, leading to more precise and contextually relevant edits. Additionally, to boost flexibility while ensuring content consistency, DiffEditor integrates stochastic differential equations (SDE) into ordinary differential equation (ODE) sampling, thereby enriching the editing process.

DiffEditor additionally employs regional score-based gradient guidance and a time travel strategy in diffusion sampling, further elevating the quality of editing outcomes. These enhancements enable DiffEditor to efficiently achieve state-of-the-art performance in various fine-grained image editing tasks, including object moving, resizing, appearance replacing, content dragging, and object pasting.