ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Enhances the controllability of text-to-image diffusion models by optimizing pixel-level cycle consistency between generated images and conditional controls.
ControlNet++ provides a pioneering method in controllable image generation within text-to-image diffusion models. It addresses the challenges faced by existing methods in generating images that accurately align with conditional controls.
Unlike previous approaches that implicitly achieve controllability through latent-space denoising, ControlNet++ explicitly optimizes controllability at the pixel level for better performance. The key insight is the optimization of pixel-level cycle consistency between generated images and conditional controls.
To implement pixel-level loss within diffusion models, ControlNet++ disrupts consistency between input images and conditions by adding noise during training. This disturbance allows for efficient single-step denoising to reconstruct consistency, avoiding the time and memory overheads associated with multiple samplings.
By utilizing discriminative reward models and an efficient reward strategy, ControlNet++ achieves significant improvements in controllability under various conditions. For instance, it outperforms ControlNet by notable margins in metrics like segmentation mask, line-art edge, and depth conditions.
Comments
None