FreeStyle : Free Lunch for Text-guided Style Transfer using Diffusion Models

Author: VRAMrod
Published: 1/31/2024, 4:18:21 AM
Category: Research

Style transfer method using pre-trained large diffusion models which allows style transfer through text descriptions, with a dual-stream encoder and single-stream decoder architecture to achieve content and style decoupling

Paper

https://arxiv.org/abs/2401.15636

Project

https://freestylefreelunch.github.io/

The paper introduces FreeStyle, a novel style transfer method based on pre-trained large diffusion models, which enables style transfer solely through a text description of the desired style, eliminating the need for style images. FreeStyle is built upon a dual-stream encoder and single-stream decoder architecture, replacing the conventional U-Net in diffusion models. The dual-stream encoder separately encodes the content image and style text prompt as inputs, achieving content and style decoupling. In the decoder, features from the dual streams are modulated based on a given content image and the corresponding style text prompt for precise style transfer. The method requires no further optimization and exhibits high-quality synthesis and fidelity across various content images and style text prompts.

The paper highlights the distinction between finetune-based methods, inversion-based methods, and FreeStyle. While finetune-based methods involve optimizing parameters to embed a given visual style into the output domain of the diffusion model, and inversion-based methods learn specific style concepts as textual tokens to guide style-specific generation, FreeStyle leverages the intrinsic style reconstruction ability of the diffusion model for effective style transfer without optimization or style reference. The proposed feature fusion module modulates the features of image content and the corresponding style text prompt to balance the preservation of content information and artistic consistency.

The experimental results demonstrate the robustness and generalization ability of FreeStyle across various styles and content, including buildings, landscapes, animals, and human portraits. The method achieves accurate style expression and high-quality content-style fusion, outperforming state-of-the-art techniques in terms of natural stylization effects, robustness, and accurate style expression. Additionally, qualitative and quantitative comparisons with other methods validate the superior performance of FreeStyle in achieving effective style transfer consistently across diverse style transfer tasks and content images.