Cross-Image Attention for Zero-Shot Appearance Transfer
Zero-shot method leveraging text-to-image generative models' semantic understanding to transfer visual appearance between images depicting objects with similar semantics but varying shapes
Cross-Image Attention for Zero-Shot Appearance Transfer aims to reshape how we transfer visual appearances between distinct objects. This approach taps into the semantic knowledge embedded in generative models, allowing for the exchange of visual attributes without the need for intricate optimization or extensive model training.
The method capitalizes on the self-attention layers inherent in generative models, introducing a cross-image attention mechanism which establishes semantic correspondences across two images—one representing the target structure and the other encapsulating the desired appearance. During the denoising process, this cross-image attention fuses the structure's queries with the appearance's keys and values, yielding an output that blends the desired structure and appearance.
What sets this method apart is its zero-shot nature, requiring no optimization or training procedures. The experimental validation underscores its effectiveness, demonstrating its prowess across diverse object categories and its resilience in accommodating variations in shape, size, and viewpoint between the paired input images.
Comments
None