ElasticDiffusion: Training-free Arbitrary Size Image Generation
Training-free decoding method for pretrained text-to-image diffusion models, decoupling local and global content generation to achieve superior image coherence across various sizes and aspect ratios.
ElasticDiffusion sets out to advance image generation, offering a training-free decoding method to liberate pretrained text-to-image diffusion models from the shackles of fixed sizes and aspect ratios. Their method estimates local content based on smaller patches, ensuring fine-grained control over low-level pixel information. Concurrently, the global content, preserving overall structural consistency, is computed using a reference latent obtained through downsampling. To maintain the aspect ratio of the input latent, a padding strategy with a constant color background is employed.
ElasticDiffusion was tested on CelebA-HQ and LAION-COCO datasets, yielding superior image coherence quality when compared to counterparts like MultiDiffusion and the standard decoding strategy of Stable Diffusion. The addition of resampling techniques and a Reduced-Resolution guidance strategy enriches the method, enhancing global content resolution while mitigating potential artifacts.
Comments
None