Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Methodology for human image animation using a 3D human parametric model and latent diffusion framework to improve shape alignment and motion guidance in human generative techniques.
Champ presents a new approach to human image animation that integrates the SMPL 3D parametric human model with latent diffusion models. This integration aims to enhance pose alignment and motion guidance in the animation process. By leveraging the unified representation of shape and pose variations offered by the SMPL model, along with depth, normal, and semantic maps, the method improves the ability to capture realistic human movements and shapes. The inclusion of skeleton-based motion guidance and self-attention mechanisms for feature map integration further refines the animation process, enabling the creation of dynamic visual content that accurately reflects human anatomy and movement.
The method involves a video diffusion model that incorporates motion guidance derived from 3D human parametric models, specifically the SMPL model. This model extracts a continuous sequence of SMPL poses from motion data, creating a multilevel guidance that encapsulates both 2D and 3D characteristics to enhance the model's comprehension of human shape and pose attributes. A motion embedding module is introduced to incorporate the multilayer guidance into the model, refining multiple latent embeddings of motion guidance through self-attention mechanisms and fusing them together using a multi-layer motion fusion module.
The network structure of the proposed method includes a pipeline that integrates the SMPL model, motion guidance, VAE encoder, CLIP image encoder, ReferenceNet, and temporal alignment module. The SMPL model operates on low-dimensional parameters for pose and shape, generating a 3D mesh representation with vertex-wise weights for human part segmentation. The VAE encoder encodes the reference image, which is then fed into the ReferenceNet for maintaining consistency in generated videos. The temporal alignment module applies temporal attention across frames to ensure video consistency.
Comments
None