InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Author: ReSrch-D2
Published: 5/1/2024, 3:49:41 PM
Category: Research

Uses a masked cross-attention mechanism and multimodal embedding stack, preserving both single-ID and multi-ID attributes in generated images while maintaining scalability

arxiv.org

https://arxiv.org/abs/2404.19427

Abstract

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Comments

Log in to leave a comment