TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Deformation-based radiance fields framework for high-fidelity talking head synthesis
TalkingGaussian addresses the facial distortion problem in existing radiance-fields-based methods by utilizing a deformation-based framework. It aims to synthesize high-quality talking head videos by applying deformations to a persistent head structure, simplifying the representation of facial motions. The framework consists of a Deformable Gaussian Field, which includes a static Persistent Gaussian Field and a neural Grid-based Motion Field to separate the persistent head structure from dynamic facial motions. By decoupling the face and inside mouth areas into separate branches, the method simplifies the learning tasks and improves the synthesis quality in both static structure and dynamic performance.
TalkingGaussian leverages 3D Gaussian Splatting (3DGS) to provide an explicit space representation with a set of Gaussian primitives, enabling accurate control of spatial points and stable head structure. The deformation paradigm applied to the persistent head structure allows for precise representation of facial motions, eliminating distortions caused by inaccurately predicted appearance. An incremental sampling strategy is introduced to facilitate smooth learning for target facial motions, utilizing face action priors to schedule the optimization process of deformation. Additionally, a Face-Mouth Decomposition module is implemented to address motion inconsistency between the face and inside mouth regions, leading to better visual-audio synchronization and accurate mouth reconstruction.
The method optimizes the parameters of Gaussian primitives through gradient descent under color supervision, with a densification strategy to control the growth of primitives and prune unnecessary ones. By maintaining a persistent head structure and decomposing conflicting motions into different spaces, TalkingGaussian achieves superior performance in synthesizing realistic and accurate talking heads compared to existing methods. Ethical considerations are highlighted, emphasizing responsible use of the technique and support for the development of deepfake detection techniques.
Comments
None