An Energy-Efficient Compute-in/near-Memory eDRAM Processor for Sparse Transformer-Based Large Language Models

Scholarworks@UNIST

UNIST Library

There are no files associated with this item.

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Cited time in webofscience

Cited time in scopus

Metadata Downloads

An Energy-Efficient Compute-in/near-Memory eDRAM Processor for Sparse Transformer-Based Large Language Models

URI: https://scholarworks.unist.ac.kr/handle/201301/90965 http://unist.dcollection.net/common/orgView/200000965198

Abstract: The transformer architecture has enabled large language models (LLMs) to improve a wide range of AI applications. A primary component, the multi-head self-attention mechanism, presents a major bottleneck due to its extensive computational and memory bandwidth requirements. While recent approaches using into sparse attention and attention formula reordering address these challenges, efficient LLM processing remains a key bottleneck for existing hardware. This thesis proposes an energy-efficient compute-in/near-memory (CINM) processor using eDRAM, to mitigate these bottlenecks through three key features. First, an attention block fusion computation strategy is employed to maximize data reuse within the attention map. This approach yields an 85.86% reduction in external memory access and achieves hardware utilization to 86.1%. Second, a CINM architecture resolves the imbalance between memory and computation, which, combined with a heterogeneous pipeline, achieves a 77.27% reduction in system latency. Third, a compute-in-memory array supporting the cross- read operation eliminates data direction conflicts, resulting in a 98.44% latency reduction. Furthermore, this array utilizes dual-row computation with reduced adder logic to improve energy efficiency by 1.58×. The processor, designed in 28nm CMOS technology, achieves 36.28–58.05 TOPS/W and demonstrates an F1 score of 92.41% on the SQuAD 1.1v benchmark using the BigBird-large model.

qrcode

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.