A Triplane Looks Like an Image: Parameter-Efficient Fine-tuning of Large Vision Model for 3D Shape Generation

Park, Sangjune

Scholarworks@UNIST

UNIST Library

File Download

200000813268.pdf

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

A Triplane Looks Like an Image: Parameter-Efficient Fine-tuning of Large Vision Model for 3D Shape Generation

Alternative Title: 거대 비전 모델 미세 조정 기반의 3차원 형상 생성 기법 연구

Author(s): Park, Sangjune

Advisor: Joo, Kyungdon

Issued Date: 2024-08

URI: https://scholarworks.unist.ac.kr/handle/201301/84193 http://unist.dcollection.net/common/orgView/200000813268

Abstract: Generating 3D object shapes is one of the most challenging and significant tasks in 3D vision and computer graphics. Recent works have introduced useful approaches that create high-quality and diverse 3D objects, leveraging advanced neural representations and generative models. However, these methods require extensive training time and resources, often taking at least 2 to 3 days with over 4 high-end GPUs. In addition, there is no open large model that can be fine-tuned for efficient training due to the scarcity and high computational cost of 3D data. In this thesis, I propose a method for directly fine-tuning a 2D large vision model for 3D mesh generation based on triplane representation. Triplanes store encoded 3D geometric information in the form of 2D feature maps, serving as a bridge between the 2D and 3D domains. Although pretrained parameters of the large vision model are optimized for the 2D domain, they can be used for initializing the parameters of the 3D generative model. By overcoming the different nature of 2D image latents and triplane latents for 3D shapes, this approach significantly reduces the time required to learn 3D data. I also provide experiments and analyses including additional parameter- efficient fine-tuning methods. The proposed fine-tuning approaches achieve much faster convergence with a single GPU. Moreover, the parameter-efficient fine-tuning applied models not only have a much smaller number of trainable parameters to store but also enable switching tasks by simply changing the adapted weight parameters, while maintaining no additional latency during inference.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Graduate School of Artificial Intelligence

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.