An Energy-Efficient Compute-in/near-Memory eDRAM Processor for Sparse Transformer-Based Large Language Models

An, Sun Hong

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Yoon, Heein	-
dc.contributor.author	An, Sun Hong	-
dc.date.accessioned	2026-03-26T22:14:01Z	-
dc.date.available	2026-03-26T22:14:01Z	-
dc.date.issued	2026-02	-
dc.description.abstract	The transformer architecture has enabled large language models (LLMs) to improve a wide range of AI applications. A primary component, the multi-head self-attention mechanism, presents a major bottleneck due to its extensive computational and memory bandwidth requirements. While recent approaches using into sparse attention and attention formula reordering address these challenges, efficient LLM processing remains a key bottleneck for existing hardware. This thesis proposes an energy-efficient compute-in/near-memory (CINM) processor using eDRAM, to mitigate these bottlenecks through three key features. First, an attention block fusion computation strategy is employed to maximize data reuse within the attention map. This approach yields an 85.86% reduction in external memory access and achieves hardware utilization to 86.1%. Second, a CINM architecture resolves the imbalance between memory and computation, which, combined with a heterogeneous pipeline, achieves a 77.27% reduction in system latency. Third, a compute-in-memory array supporting the cross- read operation eliminates data direction conflicts, resulting in a 98.44% latency reduction. Furthermore, this array utilizes dual-row computation with reduced adder logic to improve energy efficiency by 1.58×. The processor, designed in 28nm CMOS technology, achieves 36.28–58.05 TOPS/W and demonstrates an F1 score of 92.41% on the SQuAD 1.1v benchmark using the BigBird-large model.	-
dc.description.degree	Master	-
dc.description	Department of Electrical Engineering	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/90965	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000965198	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.subject	MAX, MXene, Electrochemical energy applications	-
dc.title	An Energy-Efficient Compute-in/near-Memory eDRAM Processor for Sparse Transformer-Based Large Language Models	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.