DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Author: NewsCrawler
Published: 1/22/2024, 5:40:19 PM
Category: Resource

Expressive talking head generation model that leverages a denoising network, a style-aware lip expert, and a style predictor within diffusion models to generate high-quality accurate lip-sync animations without style references.

Paper

https://arxiv.org/abs/2312.09767?

Project

https://dreamtalk-project.github.io/?

Code

https://github.com/ali-vilab/dreamtalk?

The DreamTalk framework works to direct diffusion models towards the task of expressive talking head generation. Comprising three integral components—a denoising network, a style-aware lip expert, and a style predictor—DreamTalk redefines the boundaries of audio-driven face animation.

The diffusion-based denoising network stands as the cornerstone, consistently synthesizing high-quality face motions reflective of diverse expressions, driven by audio cues. To heighten the expressiveness and precision of lip motions, a style-aware lip expert is introduced, adept at guiding lip-sync while staying attuned to varying speaking styles. DreamTalk sidesteps the necessity for elaborate expression reference videos or text, thanks to an additional diffusion-based style predictor, enabling direct prediction of target expressions from audio inputs.

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Comments

Log in to leave a comment