StyleBooth: Image Style Editing with Multimodal Instruction
Method for image style editing that integrates textual instructions and image exemplars to generate high-quality training data and improve the quality of generated images.
StyleBooth unveils a pioneering method for image style editing that amalgamates text and exemplar-based instructions, facilitating the creation of high-quality stylized images across diverse editing tasks. StyleBooth uses a unified framework that leverages a pre-trained editing model to process multimodal instructions, combining text and exemplar images for style editing purposes. The method involves encoding reference images and text independently, transforming and aligning them in a latent space, and injecting them into a generative network for editing guidance. Additionally, StyleBooth incorporates a Scale Weighting Mechanism to balance the presence of visual elements in the final results.
To construct a high-quality style editing dataset, StyleBooth initiates with high-quality stylized and plain unpaired images. Employing iterative style adding and removal transformations, they generate image pairs for training data. Through the utilization of T2I-synthesized images and the implementation of a filter mechanism during multi-round Iterative Style-Destyle Tuning and Editing, the dataset is refined for enhanced transformation quality. This dataset encompasses a wide array of styles and comprises stylized/plain image pairs with identical content, ensuring top-notch images for training.
Comments
None