Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks

Lee, Changhyeon; Lee, Seulki

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

이슬기

Lee, Seulki: Embedded Artificial Intelligence Lab.

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	US	-
dc.citation.conferencePlace	미국, 뉴올리언스	-
dc.citation.title	Neural Information Processing Systems	-
dc.contributor.author	Lee, Changhyeon	-
dc.contributor.author	Lee, Seulki	-
dc.date.accessioned	2024-01-03T00:05:09Z	-
dc.date.available	2024-01-03T00:05:09Z	-
dc.date.created	2024-01-02	-
dc.date.issued	2023-12-14	-
dc.description.abstract	In this paper, we propose to approximate the softmax output, which is the key product of the attention mechanism, to reduce its activation memory usage when training attention-based networks (aka Transformers). During the forward pass of the network, the proposed softmax output approximation method stores only a small fraction of the entire softmax output required for back-propagation and evicts the rest of the softmax output from memory. Then, during the backward pass, the evicted softmax activation output is approximated to compose the gradient to perform back-propagation for model training. Considering most attention-based models heavily rely on the softmax-based attention module that usually takes one of the biggest portions of the network, approximating the softmax activation output can be a simple yet effective way to decrease the training memory requirement of many attention-based networks. The experiment with various attention-based models and relevant tasks, i.e., machine translation, text classification, and sentiment analysis, shows that it curtails the activation memory usage of the softmax-based attention module by up to 84% (6.2x less memory) in model training while achieving comparable or better performance, e.g., up to 5.4% higher classification accuracy.	-
dc.identifier.bibliographicCitation	Neural Information Processing Systems	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/67541	-
dc.identifier.url	https://proceedings.neurips.cc/paper_files/paper/2023/hash/311257424b6d80e930fc93b224f0a63e-Abstract-Conference.html	-
dc.language	영어	-
dc.publisher	Neural Information Processing Systems	-
dc.title	Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2023-12-10	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.