This thesis studies the methodology to create 3D content of user-specific subjects. Recent text-to-3D content generation via Score Distillation Sampling (SDS), which leverages a 2D text-to-image diffusion model to optimize a 3D model, has shown remarkable performance in zero-shot 3D content generation. However, despite the advances in text-to-3D content generation, these methods often fail to generate user-defined 3D content such as 3D content of their own dog. As a result, there has been increasing attention on text-to-3D customization. Despite this growing interest, existing literature on text-to-3D customization mainly focuses on single-concept 3D customization, limiting its application to more diverse scenarios. We explore more complex scenarios, multi-concept 3D customization. This approach aims to create 3D content that includes multiple user-defined concepts such as 3D content of my own dog sitting on my car. However, naively adapting text-to-3D customization methods often fails to produce multi-concept 3D content because of two significant challenges: poor multi-object generation and the concept mixing problem. To address these challenges, we introduce MAGIC-SD3D (Multi-concept Alignment and Geometric Integration with Concept-aware Score Distillation in 3D) that extends the principles of 2D customization to the complexities of generating coherent multi-concept 3D content.
Publisher
Ulsan National Institute of Science and Technology