An OPA-Enabled Hardware Architecture for On-Device Training in ReRAM Crossbar Arrays

Kim, Seungsu

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

An OPA-Enabled Hardware Architecture for On-Device Training in ReRAM Crossbar Arrays

Author(s): Kim, Seungsu

Advisor: Lee, Jongeun

Issued Date: 2026-02

URI: https://scholarworks.unist.ac.kr/handle/201301/90954 http://unist.dcollection.net/common/orgView/200000965861

Abstract: Training deep neural networks (DNNs) directly on ReRAM crossbar arrays (RCAs) is highly desirable for overcoming the memory bottleneck and enabling massively parallel weight updates, but remains largely unexplored due to the lack of hardware architectures capable of handling the complexity of analog weight updates and peripheral interfaces. While outer product accumulation (OPA) has been proposed as a promising primitive for massively parallel weight updates, prior work has not demon- strated a concrete hardware architecture capable of supporting OPA-based training. This work presents the first end-to-end hardware architecture for OPA-enabled in-memory DNN training. The proposed de- sign integrates RCAs with DAC/ADC interfaces and unified control logic supporting forward MVM, backward MTVM, and OPA weight-update operations. Nonlinear digital operations are delegated to a host CPU, while all linear algebra critical to training is executed on the RCA. We develop a complete hardware prototype using high-level synthesis (HLS) and validate it on FPGA, with the RCA behavior emulated in hardware to verify the architectural functionality. To support low-cost analog interfaces, we further propose a quantization methodology tailored for OPA-based IMC hardware, jointly optimizing DAC/ADC precision and training stability. Our results demonstrate that the proposed architecture per- forms functional on-device training, achieves efficient hardware performance through HLS pipelining and unrolling optimizations, and maintains competitive accuracy under low-precision analog constraints. This work establishes the first practical hardware pathway for OPA-driven in-memory training systems.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Department of Electrical Engineering

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.