RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting
3D inpainting method focused on providing explicit control over the inpainted content using a reference image for adaptation
RefFusion introduces a novel 3D inpainting method that leverages a multi-scale personalization approach to adapt an image inpainting diffusion model to a given reference view. This personalization enhances the controllability of the inpainted content and reduces the variance of the score distillation objective, resulting in sharper details. The framework is designed to address challenges such as trade-offs between diversity and multi-view consistency, fidelity to observed content, and conflicting gradients in inpainted views.
RefFusion is based on continuous distillation of a reference-adapted diffusion model, incorporating a multi-scale personalization method to adapt the model to the reference view. The framework utilizes 3D Gaussian splatting to consolidate noisy 2D masks and direct gradients to relevant regions. A combination of objective terms enables the use of the SDS optimization procedure at the scene level. The method is supported by a new dataset designed for evaluating object removal and 3D inpainting, featuring scenes with large camera motion.
Implementation details include the use of Gaussian optimization parameters, LoRA optimization parameters, and training iterations for the adapted LDM. The reference views are generated using an inpainting model for speed and quality. The framework's effectiveness is demonstrated through high visual quality, controllability, and diversity in experiments. Additionally, the generality of the formulation is showcased through applications in object insertion, scene outpainting, and sparse view reconstruction.
Comments
None