CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
High-fidelity feed-forward single image-to-3D generative model that leverages geometric priors for efficient generation of textured meshes.
Convolutional Reconstruction Model (CRM) is a breakthrough framework for generating high-quality 3D models from a single image. This approach effectively utilizes the spatial relationship between input images and the output triplane, resulting in improved textured meshes. Unlike previous transformer-based methods, the CRM operates on an end-to-end training basis, directly outputting textured meshes. The model can produce detailed textured meshes in just 10 seconds, significantly reducing training costs.
The paper introduces the design of the multi-view diffusion models, which are essential components of the CRM. The work sequentially adds proposed techniques to the training process and examine the results on a subset of the dataset, comparing the similarity of the generated novel view images with the ground truth images using various metrics. The results show that the Zero-SNR trick and random resizing are beneficial, while contour augmentation does not improve quantitative metrics. However, the authors find that the contour augmentation trick makes the model more robust to diverse input images.
Comments
None