Newsletter

One sentence generates old fashioned AI painting! Now NVIDIA’s Magic3D lets you generate 3D models in one sentence

Nvidia’s research into the field of AI generative models directly has one dimension more than others: sentence description to generate 3D models.

We live in a 3D world, and although most applications today are in 2D, there has always been a high demand for 3D digital content, including applications such as games, entertainment, construction and robotics simulations.

However, creating professional 3D content requires high artistic and aesthetic literacy and extensive 3D modeling expertise. Doing this work by hand takes a lot of time and effort to develop these skills.

The demand is high and it is a “labor intensive industry”, so is it possible to transfer it to AI? On Friday, Nvidia’s paper submission to the preprint platform arXiv attracted attention.

Similar to the now popular NovelAI, people only need to input a piece of text such as “a blue poison dart frog sitting on a water lily”, and AI can generate a 3D model with a complete texture and shape for you.

Magic3D can also perform 3D mesh editing based on pointers: given a low-resolution 3D model and basic pointers, text can be changed to modify the contents of the resulting model. In addition, the authors demonstrate the ability to maintain style and apply 2D image styles to 3D models.

One sentence generates old fashioned AI painting! Now NVIDIA's Magic3D lets you generate 3D models in one sentence

The Stable Diffusion paper was only presented for the first time in August 2022, and it has evolved to such an extent in a few months, that makes people wonder at the speed of technological development.

Nvidia said that you only need to modify it a little on this basis, and the resulting model can be used as material for games or CGI art scenes.

The direction of the 3D generation model is not mysterious. In fact, on September 29, Google released DreamFusion, a text-to-3D generation model. Nvidia’s research goal in Magic3D is based directly on this method.

One sentence generates old fashioned AI painting! Now NVIDIA's Magic3D lets you generate 3D models in one sentence

Similar to DreamFusion’s process of generating 2D images of text and optimizing them to NeRF (Neural Radiation Field) volumetric data, Magic3D uses a two-stage generation method, using a rough model generated at low resolution and then optimized to a higher resolution Expenditure.

Nvidia’s approach first uses low-resolution diffusion before obtaining a coarse model and accelerates it using a distributed 3D hash grid structure. Starting with a rough representation, a textured 3D mesh model with an efficient differential renderer that interacts with a high-resolution latent diffusion model is further optimized.

Magic3D can create a high quality 3D mesh model in 40 minutes, 2 times faster than DreamFusion (which takes 1.5 hours on average), and at a higher resolution. Statistics show that 61.7% of people prefer Nvidia’s new approach to DreamFusion.

Together with image conditioning production functions, the new technology opens up new avenues for a variety of creative applications.

Paper download link: Magic3D: Creating High Resolution Text-to-3D Content

technical details

Magic3D can synthesize highly detailed 3D models of text prompts in short computation time. Magic3D synthesizes high-quality 3D content using text prompts by enhancing several major design choices in DreamFusion.

Specifically, Magic3D is a coarse-to-fine optimization method where multiple diffusion fronts at different resolutions are used to optimize 3D representations, resulting in visually consistent geometry and high-resolution detail. Magic3D uses a supervised method to synthesize 8x high resolution 3D content, also 2x faster than DreamFusion.

The entire Magic3D workflow is divided into two phases: in the first phase, the research optimizes DreamFusion-like rough neural field representation to achieve efficient scene representation with grid-based hash memory and computation.

In the second step the method switches to the best mesh representation. This step is critical, allowing the method to exploit diffusion priorities in resolutions up to 512 × 512. Since 3D meshes are suitable for fast graphics rendering and high-resolution images can be rendered on the fly, the study this uses an efficient differential renderer based on rasterization and a camera shutter to recover high frequency detail in geometric textures.

One sentence generates old fashioned AI painting! Now NVIDIA's Magic3D lets you generate 3D models in one sentence

Based on the two steps above, the method can produce highly realistic 3D content, which can be easily imported and visualized in standard graphics software.

Furthermore, the study shows the creative control of the 3D synthesis process with text suggestions, as shown in Figure 1 below.

One sentence generates old fashioned AI painting! Now NVIDIA's Magic3D lets you generate 3D models in one sentence

To compare the actual impact of the application, Nvidia researchers compared the content produced by Magic3D and DreamFusion on 397 text suggestions. The coarse model generation step took an average of 15 minutes and the fine step was trained in 25 minutes, and all execution times were measured on 8 Nvidia A100 GPUs.

One sentence generates old fashioned AI painting! Now NVIDIA's Magic3D lets you generate 3D models in one sentence

One sentence generates old fashioned AI painting! Now NVIDIA's Magic3D lets you generate 3D models in one sentence

Although papers and demos are only the first step, Nvidia has already figured out the direction of Magic3D’s future application: providing tools for making massive 3D models for games and metaverse worlds, and making them accessible to everyone.

Of course, Nvidia’s own Omniverse may be the first to launch this feature.

source: