AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Author: NewsCrawler
Published: 3/28/2024, 4:52:50 PM
Category: Resource

Framework for generating facial motion animation from audio and a reference portrait image

Paper

Code

https://github.com/Zejun-Yang/AniPortrait

AniPortrait is a pioneering animation generation framework comprised of two primary modules: Audio2Lmk and Lmk2Video. The Audio2Lmk module utilizes wav2vec for audio feature extraction and employs separate networks to convert audio inputs into 3D facial meshes and head poses. These predictions are transformed into 2D facial landmarks for subsequent processing.

Meanwhile, the Lmk2Video module takes a reference image and the 2D facial landmarks as input, generating a sequence of photorealistic portrait frames. Leveraging the SD1.5 diffusion model and a motion module, this module ensures temporal consistency and visual fidelity in the generated animations. Notably, the PoseGuider module is enhanced with a multi-scale strategy and cross-attention mechanisms, improving lip movement accuracy and overall animation quality.

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Comments

Log in to leave a comment