GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
Real-time generation of pose-controllable talking heads by leveraging 3D Gaussian Splatting and efficiently manipulating audio features.
GaussianTalker presenting a framework for real-time synthesis of pose-controllable talking heads, revolutionizing dynamic facial animation through the integration of 3D Gaussian splatting and spatial-audio attention mechanisms. Unlike previous methods, GaussianTalker capitalizes on the rapid scene modeling capabilities of 3D Gaussian splatting, enabling swift and accurate manipulation of facial attributes synchronized with speech audio.
GaussianTalker constructs a static 3D Gaussian representation of the canonical head shape and dynamically deforms it in accordance with the audio input. This process involves extracting feature embeddings for each Gaussian position via a multi-resolution triplane, facilitating precise estimation of Gaussian attributes crucial for faithful facial animation.
The framework's spatial-audio attention module seamlessly integrates feature embeddings with audio features, predicting frame-wise offsets for each Gaussian attribute. This cross-attention mechanism enhances stability and ensures region-specific deformation across a multitude of Gaussians, maintaining spatial consistency and intricate facial details.
Moreover, GaussianTalker explores the disentanglement of audio-unrelated motion by analyzing attention scores across various experiment settings. Through visualization of the impact of eye features, viewpoint, and null-vector on attention distribution, the framework elucidates the effectiveness of each input condition in capturing speech-related motion.
Comments
None