DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Expressive talking head generation model that leverages a denoising network, a style-aware lip expert, and a style predictor within diffusion models to generate high-quality accurate lip-sync animations without style references.
The DreamTalk framework works to direct diffusion models towards the task of expressive talking head generation. Comprising three integral components—a denoising network, a style-aware lip expert, and a style predictor—DreamTalk redefines the boundaries of audio-driven face animation.
The diffusion-based denoising network stands as the cornerstone, consistently synthesizing high-quality face motions reflective of diverse expressions, driven by audio cues. To heighten the expressiveness and precision of lip motions, a style-aware lip expert is introduced, adept at guiding lip-sync while staying attuned to varying speaking styles. DreamTalk sidesteps the necessity for elaborate expression reference videos or text, thanks to an additional diffusion-based style predictor, enabling direct prediction of target expressions from audio inputs.
Comments
None