Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Avatar representation for modeling human avatars from RGB videos using a combination of 2D CNNs and 3D Gaussian splatting.
The paper presents a novel method for creating high-fidelity, animatable avatars with realistic dynamic appearances. The method leverages a combination of 2D convolutional neural networks (CNNs) and explicit 3D Gaussian splatting to achieve finer-grained dynamic appearances. The process begins with the reconstruction of a character-specific template using an MLP-based SDF and color field representation. The template is optimized to accurately capture the dynamic details of the avatar, including clothing wrinkles and textures. The network module used for avatar representation is a conditional StyleGAN-based generator, which is adapted to predict both front and back Gaussian maps. The training process involves the use of the Adam optimizer and specific loss functions to optimize the network.
In terms of implementation details, the paper provides insights into the optimization process for template reconstruction, including the use of root finding to find accurate correspondence in the canonical space and the employment of the Gauss-Newton method for solving equations. The network architecture, training procedure, and loss weights for the avatar representation network are also detailed. Additionally, the paper discusses the metrics used for quantitative evaluations, including PSNR, SSIM, LPIPS, and FID, and provides specific details on how these metrics are computed.
Comments
None