File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

황랑기

Hwang, Ranggi
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization

Author(s)
Lee, SeonhoHwang, RanggiPark, JongseRhu, Minsoo
Issued Date
2023-01
DOI
10.1109/LCA.2022.3233832
URI
https://scholarworks.unist.ac.kr/handle/201301/88086
Citation
IEEE COMPUTER ARCHITECTURE LETTERS, v.22, no.1, pp.13 - 16
Abstract
The recent advancement of the natural language processing (NLP) models is the result of the ever-increasing model size and datasets. Most of these modern NLP models adopt the Transformer based model architecture, whose main bottleneck is exhibited in the self-attention mechanism. As the computation required for self-attention increases rapidly as the model size gets larger, self-attentions have been the main challenge for deploying NLP models. Consequently, there are several prior works which sought to address this bottleneck, but most of them suffer from significant design overheads and additional training requirements. In this work, we propose HAMMER, hardware-friendly approximate computing solution for self-attentions employing mean-redistribution and linearization, which effectively increases the performance of self-attention mechanism with low overheads. Compared to previous state-of-the-art self-attention accelerators, HAMMER improves performance by 1.2-1.6x and energy efficiency by 1.2-1.5x.
Publisher
IEEE COMPUTER SOC
ISSN
1556-6056
Keyword (Author)
transformersApproximate computinghardware acceleratorneural networksparse computation

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.