InstanceDiffusion: Instance-level Control for Image Generation
Creates precise control over individual instances within generated images and offers various methods to specify instance locations and conditions
Discover the latest breakthrough in text-to-image generation: InstanceDiffusion. Unlike traditional models that produce images without fine-grained control, InstanceDiffusion equips users with the ability to dictate the specifics of each instance within generated images with unprecedented precision.
At the heart of InstanceDiffusion are three groundbreaking innovations: UniFusion, ScaleU, and Multi-instance Sampler. UniFusion seamlessly integrates various instance-level conditions, allowing users to specify instance locations through points, scribbles, bounding boxes, or detailed segmentation masks. ScaleU enhances image fidelity by recalibrating features within the model, ensuring faithful adherence to specified layout conditions. Meanwhile, Multi-instance Sampler minimizes confusion between conditions for multiple instances, ensuring accurate generation across the board.
InstanceDiffusion surpasses specialized models on key metrics, outperforming previous state-of-the-art approaches by significant margins. On the COCO dataset, InstanceDiffusion boasts a remarkable 20.4% improvement in AP50box for box inputs and a staggering 25.4% enhancement in IoU for mask inputs.
It also supports iterative and multi-round image generation. With identical initial noise and image captions, InstanceDiffusion can progressively add, edit, or replace instances within generated images, offering endless possibilities for creative expression.
Comments
There is now a ComfyUI implementation of this