File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

An OPA-Enabled Hardware Architecture for On-Device Training in ReRAM Crossbar Arrays

Author(s)
Kim, Seungsu
Advisor
Lee, Jongeun
Issued Date
2026-02
URI
https://scholarworks.unist.ac.kr/handle/201301/90954 http://unist.dcollection.net/common/orgView/200000965861
Abstract
Training deep neural networks (DNNs) directly on ReRAM crossbar arrays (RCAs) is highly desirable for overcoming the memory bottleneck and enabling massively parallel weight updates, but remains largely unexplored due to the lack of hardware architectures capable of handling the complexity of analog weight updates and peripheral interfaces. While outer product accumulation (OPA) has been proposed as a promising primitive for massively parallel weight updates, prior work has not demon- strated a concrete hardware architecture capable of supporting OPA-based training. This work presents the first end-to-end hardware architecture for OPA-enabled in-memory DNN training. The proposed de- sign integrates RCAs with DAC/ADC interfaces and unified control logic supporting forward MVM, backward MTVM, and OPA weight-update operations. Nonlinear digital operations are delegated to a host CPU, while all linear algebra critical to training is executed on the RCA. We develop a complete hardware prototype using high-level synthesis (HLS) and validate it on FPGA, with the RCA behavior emulated in hardware to verify the architectural functionality. To support low-cost analog interfaces, we further propose a quantization methodology tailored for OPA-based IMC hardware, jointly optimizing DAC/ADC precision and training stability. Our results demonstrate that the proposed architecture per- forms functional on-device training, achieves efficient hardware performance through HLS pipelining and unrolling optimizations, and maintains competitive accuracy under low-precision analog constraints. This work establishes the first practical hardware pathway for OPA-driven in-memory training systems.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Department of Electrical Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.