File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

이슬기

Lee, Seulki
Embedded Artificial Intelligence Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks

Author(s)
Lee, ChanghyeonLee, Seulki
Issued Date
2023-12-14
URI
https://scholarworks.unist.ac.kr/handle/201301/67541
Fulltext
https://proceedings.neurips.cc/paper_files/paper/2023/hash/311257424b6d80e930fc93b224f0a63e-Abstract-Conference.html
Citation
Neural Information Processing Systems
Abstract
In this paper, we propose to approximate the softmax output, which is the key product of the attention mechanism, to reduce its activation memory usage when training attention-based networks (aka Transformers). During the forward pass of the network, the proposed softmax output approximation method stores only a small fraction of the entire softmax output required for back-propagation and evicts the rest of the softmax output from memory. Then, during the backward pass, the evicted softmax activation output is approximated to compose the gradient to perform back-propagation for model training. Considering most attention-based models heavily rely on the softmax-based attention module that usually takes one of the biggest portions of the network, approximating the softmax activation output can be a simple yet effective way to decrease the training memory requirement of many attention-based networks. The experiment with various attention-based models and relevant tasks, i.e., machine translation, text classification, and sentiment analysis, shows that it curtails the activation memory usage of the softmax-based attention module by up to 84% (6.2x less memory) in model training while achieving comparable or better performance, e.g., up to 5.4% higher classification accuracy.
Publisher
Neural Information Processing Systems

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.