File Download

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

A Triplane Looks Like an Image: Parameter-Efficient Fine-tuning of Large Vision Model for 3D Shape Generation

Alternative Title
거대 비전 모델 미세 조정 기반의 3차원 형상 생성 기법 연구
Author(s)
Park, Sangjune
Advisor
Joo, Kyungdon
Issued Date
2024-08
URI
https://scholarworks.unist.ac.kr/handle/201301/84193 http://unist.dcollection.net/common/orgView/200000813268
Abstract
Generating 3D object shapes is one of the most challenging and significant tasks in 3D vision and computer graphics. Recent works have introduced useful approaches that create high-quality and diverse 3D objects, leveraging advanced neural representations and generative models. However, these methods require extensive training time and resources, often taking at least 2 to 3 days with over 4 high-end GPUs. In addition, there is no open large model that can be fine-tuned for efficient training due to the scarcity and high computational cost of 3D data. In this thesis, I propose a method for directly fine-tuning a 2D large vision model for 3D mesh generation based on triplane representation. Triplanes store encoded 3D geometric information in the form of 2D feature maps, serving as a bridge between the 2D and 3D domains. Although pretrained parameters of the large vision model are optimized for the 2D domain, they can be used for initializing the parameters of the 3D generative model. By overcoming the different nature of 2D image latents and triplane latents for 3D shapes, this approach significantly reduces the time required to learn 3D data. I also provide experiments and analyses including additional parameter- efficient fine-tuning methods. The proposed fine-tuning approaches achieve much faster convergence with a single GPU. Moreover, the parameter-efficient fine-tuning applied models not only have a much smaller number of trainable parameters to store but also enable switching tasks by simply changing the adapted weight parameters, while maintaining no additional latency during inference.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.