AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Framework for generating facial motion animation from audio and a reference portrait image
AniPortrait is a pioneering animation generation framework comprised of two primary modules: Audio2Lmk and Lmk2Video. The Audio2Lmk module utilizes wav2vec for audio feature extraction and employs separate networks to convert audio inputs into 3D facial meshes and head poses. These predictions are transformed into 2D facial landmarks for subsequent processing.
Meanwhile, the Lmk2Video module takes a reference image and the 2D facial landmarks as input, generating a sequence of photorealistic portrait frames. Leveraging the SD1.5 diffusion model and a motion module, this module ensures temporal consistency and visual fidelity in the generated animations. Notably, the PoseGuider module is enhanced with a multi-scale strategy and cross-attention mechanisms, improving lip movement accuracy and overall animation quality.
Comments
None