MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Efficient 3D Gaussian Splatting model which leverages sparse multi-view images for accurate localization and depth estimation
MVSplat is a efficient 3D Gaussian splatting from sparse multi-view images. The method leverages 3D Gaussian Splatting (3DGS) as an efficient and expressive 3D representation, enabling fast rendering speed and high quality 3D reconstruction and novel view synthesis. MVSplat is designed to address the challenges of expensive per-scene optimization, high memory cost, and slow rendering speed encountered in existing neural scene representations. The method is implemented using PyTorch and an off-the-shelf 3DGS render implemented in CUDA, with a multi-view Transformer containing 6 stacked self- and cross-attention layers. The model is trained on a single A100 GPU for 300,000 iterations with the Adam optimizer.
MVSplat is compared with several representative feed-forward methods for scene-level novel view synthesis from sparse views, demonstrating its superiority in terms of visual quality metrics. The importance of the cost volume in MVSplat is highlighted, showing that it plays a fundamental role in providing better geometry quality. The cost volume serves as a cornerstone to the success of MVSplat, and its removal from the model leads to a significant decrease in visual quality metrics. Additionally, the cross-view matching and the cost volume refinement U-Net are identified as significant components in learning multi-view geometry and refining the initial cost volume, respectively.
Comments
None