HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization

Lee, Seonho; Hwang, Ranggi; Park, Jongse; Rhu, Minsoo

doi:10.1109/LCA.2022.3233832

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

황랑기

Hwang, Ranggi

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.endPage	16	-
dc.citation.number	1	-
dc.citation.startPage	13	-
dc.citation.title	IEEE COMPUTER ARCHITECTURE LETTERS	-
dc.citation.volume	22	-
dc.contributor.author	Lee, Seonho	-
dc.contributor.author	Hwang, Ranggi	-
dc.contributor.author	Park, Jongse	-
dc.contributor.author	Rhu, Minsoo	-
dc.date.accessioned	2025-09-25T13:30:01Z	-
dc.date.available	2025-09-25T13:30:01Z	-
dc.date.created	2025-09-25	-
dc.date.issued	2023-01	-
dc.description.abstract	The recent advancement of the natural language processing (NLP) models is the result of the ever-increasing model size and datasets. Most of these modern NLP models adopt the Transformer based model architecture, whose main bottleneck is exhibited in the self-attention mechanism. As the computation required for self-attention increases rapidly as the model size gets larger, self-attentions have been the main challenge for deploying NLP models. Consequently, there are several prior works which sought to address this bottleneck, but most of them suffer from significant design overheads and additional training requirements. In this work, we propose HAMMER, hardware-friendly approximate computing solution for self-attentions employing mean-redistribution and linearization, which effectively increases the performance of self-attention mechanism with low overheads. Compared to previous state-of-the-art self-attention accelerators, HAMMER improves performance by 1.2-1.6x and energy efficiency by 1.2-1.5x.	-
dc.identifier.bibliographicCitation	IEEE COMPUTER ARCHITECTURE LETTERS, v.22, no.1, pp.13 - 16	-
dc.identifier.doi	10.1109/LCA.2022.3233832	-
dc.identifier.issn	1556-6056	-
dc.identifier.scopusid	2-s2.0-85147207144	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/88086	-
dc.identifier.wosid	000932425700001	-
dc.language	영어	-
dc.publisher	IEEE COMPUTER SOC	-
dc.title	HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Hardware & Architecture	-
dc.relation.journalResearchArea	Computer Science	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	transformers	-
dc.subject.keywordAuthor	Approximate computing	-
dc.subject.keywordAuthor	hardware accelerator	-
dc.subject.keywordAuthor	neural network	-
dc.subject.keywordAuthor	sparse computation	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.