File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Approximation and inversion methods for activation memory-efficient training of deep neural networks

Author(s)
Lee, Changhyeon
Advisor
Lee, Seulki
Issued Date
2024-02
URI
https://scholarworks.unist.ac.kr/handle/201301/82162 http://unist.dcollection.net/common/orgView/200000743504
Abstract
Deep neural networks have encountered challenges due to the computational and memory requirements essential for training on large datasets. While various model optimization techniques have been explored to mitigate these issues, the benefits of such optimizations are often confined to specific models. However, some relief from these challenges can be achieved through memory-efficient training techniques applied to common model components, such as attention mechanisms and convolution layers. In the context of attention mechanisms, our focus is on the softmax function used to compute attention scores. We propose an approximation method to reduce the activation memory used in the training process of attention-based models by approximating the output of the softmax function, a component of the attention mechanism. In the forward process, not all elements are stored for backpropagation when applying the approximation method. We only store part of the softmax output, and the rest is not stored. Then, during the backward process, we approximate the unstored softmax output and use the approximated softmax output as the gradient for performing back-propagation. Regarding the Convolution layer, we introduce an algorithm that, similar to the softmax in attention mechanisms, recreates modified input data for gradient updates without the need to retain the original input data in memory. This design enhances model training by approximating the necessary information during the back-propagation process, without the requirement of storing the input data in memory.
Publisher
Ulsan National Institute of Science and Technology

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.