DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Generates three-dimensional dynamic scenes, tracking multiple objects with large motion and novel viewpoints through a "decompose-then-recompose" scheme.
DreamScene4D presents a novel approach for video-to-4D scene generation. The method aims to generate dynamic 3D scenes from complex multi-object videos, handling occlusions, large object motions, and unseen viewpoints with temporal and spatial consistency. The process involves decomposing the video scene into background and individual object trajectories, along with a motion factorization scheme to address fast-moving objects in multi-object scenes. By leveraging dynamic Gaussian splatting, the approach can accurately represent object dynamics in the video.
To achieve video scene decomposition and completion, the method utilizes zero-shot mask trackers to segment and track objects in monocular videos. It employs an adapted Stable Diffusion model with rich image inpainting priors for amodal video completion, extending it to videos for consistent object appearance recovery in occluded regions. The approach incorporates spatial-temporal self-attention and latent consistency guidance to enhance inpainting consistency and ensure temporal consistency in the inpainted video frames.
For video-to-4D scene generation, the method decomposes the dynamics into three components: object-centric motion, object-centric to world frame transformation, and camera motion. Each component is independently modeled and optimized to compose the object dynamics observed in the video. The approach includes object-centric motion optimization using a deformation network and 3D object motion computation in object-centric frames to handle fast-moving objects in the scene.
Comments
None