DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Multi-modal 3D dataset combining dance, camera movement, and music audio for dance camera synthesis
3D Dance-Camera-Music dataset (DCM) is a new multi-model that collects camera keyframes and movements along with music and dance to advance the study of dance cinematography. The dataset consists of 108 dance sequences of paired dance-camera-music data from the anime community, covering 4 languages of music. DanceCamera3D, a transformer-based diffusion network, was created with this dataset which is the first model that can robustly synthesize camera movement given music and dance. To better balance the effect of music and dance motion on camera movement, the paper proposes a strong-weak condition separation strategy for classifier-free guidance (CFG) and introduces a new body attention loss to help DanceCamera3D achieve better focus on different limb parts. Additionally, the paper devises new metrics considering shot features and fidelity to the dancing character, which are significant in dance cinematography.
This work also explores the construction of 2D and 3D music-dance datasets and the challenges in dance cinematography, emphasizing the complexity of dance cinematography compared to dance synthesis or normal cinematography. It introduces a novel Music Dance driven Camera Movement Synthesis task, aiming to automatically synthesize camera movement given music and dance, and presents DanceCamera3D as the first model for camera synthesis from music and dance. The implementation of the body attention loss (Lba) is detailed, providing a concise and clear representation of the loss function, which penalizes the joints that are inside the camera view in ground truth but outside the camera view in synthesized results.
Comments
None