Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Author: NewsCrawler
Published: 5/9/2024, 6:40:19 AM
Category: Research

Utilizes attention maps for runtime pruning of redundant tokens, achieving significant efficiency gains without retraining.

arxiv.org

https://arxiv.org/abs/2405.05252

atedm.github.io

https://atedm.github.io/

Efforts to enhance the efficiency of Diffusion Models (DMs) in text-to-image generation have led to the development of the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework. This framework introduces novel techniques for token pruning and implements a schedule to optimize the pruning strategy based on denoising steps.

In the token pruning scheme, attention maps are utilized to assign importance scores to tokens, enabling the identification of redundant tokens using the Generalized Weighted Page Rank (G-WPR) algorithm. Pruning masks are generated based on these importance scores, and tokens are subsequently pruned after the feed-forward layer of attention layers. To maintain image quality, pruned tokens are recovered using a similarity-based copy method, ensuring that essential information is retained.

The Denoising-Steps-Aware Pruning (DSAP) schedule plays a crucial role in optimizing the token pruning process. By analyzing the variance trend of attention maps across different denoising steps, the DSAP schedule adapts the pruning strategy, leaving certain attention blocks unpruned in early denoising steps to preserve image quality. This dynamic approach ensures that the pruning budget is adjusted based on the specific requirements of each denoising step, leading to improved generation quality.