GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Author: VRAMrod
Published: 5/1/2024, 4:09:27 PM
Category: Research

Transformer-based model capable of predicting high-quality 3D Gaussian primitives from sparse images, applicable to both object and scene captures.

Paper

https://arxiv.org/abs/2404.19702

Project

https://sai-bi.github.io/project/gs-lrm/

The paper presents a novel approach for 3D scene reconstruction using Large Reconstruction Models (LRMs) based on 3D Gaussian Splatting. Unlike traditional methods that rely on 3D cost volumes, the proposed method utilizes a multi-view transformer to directly regress Gaussians, allowing for multi-view correspondence reasoning. By estimating pixel-aligned 3D Gaussian primitives, the model achieves per-pixel depth estimation along with additional Gaussian properties. This approach is particularly effective in handling highly sparse input views, a challenge for cost volume-based methods.

In the context of radiance field reconstruction, the paper leverages Gaussian Splatting, a state-of-the-art technique for radiance field modeling and rendering. This enables real-time rendering and large-scale scene reconstruction by optimizing radiance field representations with differentiable rendering. The model outputs N · HW Gaussians in total, merging 3D Gaussians from all input views. This scalability with increased input resolution allows for better handling of high-frequency details and large-scale scene captures.

During training, the model renders images at multiple supervision views using predicted Gaussian splats and minimizes image reconstruction loss through a combination of Mean Squared Error (MSE) loss and Perceptual loss. The use of Layer Normalization and residual connections enhances stability during training. The training process involves pre-training with a lower resolution followed by fine-tuning at a higher resolution, utilizing advanced techniques such as Flash-Attention-v2, gradient checkpointing, and mixed-precision training for efficiency.

The model architecture does not include bias terms and initializes weights with a normal distribution. The training optimizer used is AdamW. The pseudo code of the GS-LRM algorithm is provided, implementing the method discussed in the main section along with Gaussian parametrization details. Overall, the proposed approach offers a robust framework for 3D scene reconstruction, combining transformer-based regression of Gaussians with Gaussian Splatting for efficient radiance field modeling and rendering.