Approximation and inversion methods for activation memory-efficient training of deep neural networks

Lee, Changhyeon

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Approximation and inversion methods for activation memory-efficient training of deep neural networks

Author(s): Lee, Changhyeon

Advisor: Lee, Seulki

Issued Date: 2024-02

URI: https://scholarworks.unist.ac.kr/handle/201301/82162 http://unist.dcollection.net/common/orgView/200000743504

Abstract: Deep neural networks have encountered challenges due to the computational and memory requirements essential for training on large datasets. While various model optimization techniques have been explored to mitigate these issues, the benefits of such optimizations are often confined to specific models. However, some relief from these challenges can be achieved through memory-efficient training techniques applied to common model components, such as attention mechanisms and convolution layers. In the context of attention mechanisms, our focus is on the softmax function used to compute attention scores. We propose an approximation method to reduce the activation memory used in the training process of attention-based models by approximating the output of the softmax function, a component of the attention mechanism. In the forward process, not all elements are stored for backpropagation when applying the approximation method. We only store part of the softmax output, and the rest is not stored. Then, during the backward process, we approximate the unstored softmax output and use the approximated softmax output as the gradient for performing back-propagation. Regarding the Convolution layer, we introduce an algorithm that, similar to the softmax in attention mechanisms, recreates modified input data for gradient updates without the need to retain the original input data in memory. This design enhances model training by approximating the necessary information during the back-propagation process, without the requirement of storing the input data in memory.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Graduate School of Artificial Intelligence

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.