FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model
Two-stage framework for single-image 3D reconstruction that addresses multi-view inconsistency and lack of geometric fidelity.
FDGaussian is a novel method for 3D object reconstruction, which combines an orthogonal plane decomposition mechanism with a diffusion model to synthesize multi-view consistent and geometric-aware novel view images. The method leverages Gaussian splatting to generate high-quality 3D representations efficiently without explicit depth or normal hints. In the reconstruction stage, a network architecture similar to UNet is used to map input images to mixtures of Gaussians, with epipolar attention blocks facilitating communication between views.
During the generation stage, a Vision Transformer model is employed for encoding reference images, and an output feature map is generated. The decoding process involves two decoders, one for image plane decoding and the other for orthogonal plane decoding. Epipolar attention blocks are added to facilitate information exchange between views. The optimization process utilizes the AdamW optimizer with specific parameters, and experiments are conducted on NVIDIA V100 GPUs.
Comments
None