File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

이슬기

Lee, Seulki
Embedded Artificial Intelligence Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace US -
dc.citation.conferencePlace 미국, 뉴올리언스 -
dc.citation.title Neural Information Processing Systems -
dc.contributor.author Lee, Changhyeon -
dc.contributor.author Lee, Seulki -
dc.date.accessioned 2024-01-03T00:05:09Z -
dc.date.available 2024-01-03T00:05:09Z -
dc.date.created 2024-01-02 -
dc.date.issued 2023-12-14 -
dc.description.abstract In this paper, we propose to approximate the softmax output, which is the key product of the attention mechanism, to reduce its activation memory usage when training attention-based networks (aka Transformers). During the forward pass of the network, the proposed softmax output approximation method stores only a small fraction of the entire softmax output required for back-propagation and evicts the rest of the softmax output from memory. Then, during the backward pass, the evicted softmax activation output is approximated to compose the gradient to perform back-propagation for model training. Considering most attention-based models heavily rely on the softmax-based attention module that usually takes one of the biggest portions of the network, approximating the softmax activation output can be a simple yet effective way to decrease the training memory requirement of many attention-based networks. The experiment with various attention-based models and relevant tasks, i.e., machine translation, text classification, and sentiment analysis, shows that it curtails the activation memory usage of the softmax-based attention module by up to 84% (6.2x less memory) in model training while achieving comparable or better performance, e.g., up to 5.4% higher classification accuracy. -
dc.identifier.bibliographicCitation Neural Information Processing Systems -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/67541 -
dc.identifier.url https://proceedings.neurips.cc/paper_files/paper/2023/hash/311257424b6d80e930fc93b224f0a63e-Abstract-Conference.html -
dc.language 영어 -
dc.publisher Neural Information Processing Systems -
dc.title Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks -
dc.type Conference Paper -
dc.date.conferenceDate 2023-12-10 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.