DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Author: NewsCrawler
Published: 3/1/2024, 10:20:25 PM
Category: Resource

Method to speed up diffusion model inference across multiple GPUs

Paper

https://arxiv.org/abs/2402.19481

Code

https://github.com/mit-han-lab/distrifuser

Blog

https://hanlab.mit.edu/blog/distrifusion

Project

https://hanlab.mit.edu/projects/distrifusion

DistriFusion is a groundbreaking method designed to accelerate diffusion model inference across multiple GPUs, enhancing the speed and efficiency of AI-generated content creation. Leveraging the power of parallel computing, DistriFusion divides image processing tasks into patches, assigning each patch to a separate GPU for simultaneous generation. Unlike traditional methods, which suffer from communication bottlenecks and significant latency, DistriFusion maintains seamless interactions between patches while minimizing computational overhead.

DistriFusion core innovation is patch parallelism, a technique that optimizes image processing by distributing workloads across multiple GPU devices. By synchronizing communication solely for the initial step and reusing pre-computed activations for subsequent steps, DistriFusion effectively reduces latency without sacrificing image quality. Through careful co-design of the inference framework, DistriFusion maximizes GPU utilization and minimizes per-device computation, resulting in significant speedups for AI-generated content creation.

Empirical analysis demonstrates that DistriFusion maintains superior image quality compared to baseline methods, even when using multiple GPUs for parallel inference. By preserving interaction between image patches, DistriFusion ensures seamless transitions and cohesive outputs, setting a new standard for AI-generated content creation.