MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Aims to generate multiple instances in one image with diverse controls, addressing challenges of compliance with user-given descriptions and layouts while ensuring global alignment among all instances.
The paper introduces the concept of Multi-Instance Generation (MIG), which involves generating multiple instances in one image with diverse controls, such as quantity, position, attribute, and interaction. The challenges in MIG arise from the need for instances to comply with user-given descriptions and layouts while ensuring global alignment among all instances. The authors propose the Multi-Instance Generation Controller (MIGC) approach, which aims to decompose MIG into multiple subtasks and then combine the results of those subtasks. The MIGC approach divides the complex MIG task into simpler Single-Instance shading subtasks, conquers each instance shading with an Enhancement Attention layer, and combines the final shading result through a Layout Attention layer and Shading Aggregation Controller.
The MIGC approach leverages the stable diffusion's capacity in Single-Instance Generation to facilitate the MIG task. It introduces three main modules: Enhancement Attention, Layout Attention, and Shading Aggregation Controller. The Enhancement Attention layer effectively alleviates instance missing, the Layout Attention layer significantly improves generated image quality, and the Shading Aggregation Controller aids in better aggregation of shading instances. Additionally, the paper discusses the division of MIG into instance shading subtasks in the Cross-Attention space and the benefits of this division, such as improved performance metrics and control capabilities.
Comments
None