Software-Hardware Co-Optimization for Energy-Efficient AI Processing

Kim, Jiwon

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Software-Hardware Co-Optimization for Energy-Efficient AI Processing

Author(s): Kim, Jiwon

Advisor: Kwon, Jimin

Issued Date: 2026-02

URI: https://scholarworks.unist.ac.kr/handle/201301/90960 http://unist.dcollection.net/common/orgView/200000964510

Abstract: Recent advances in deep neural networks have enabled high-level perception and reasoning across domains such as medical diagnosis and natural language processing, accelerating a shift from cloud- centric computation toward on-device intelligence. However, deploying high-performance deep neural networks on edge platforms remains challenging due to strict constraints on computation, memory capacity, and energy efficiency. This thesis addresses these challenges through complementary algorithmic and hardware design approaches. At the algorithmic level, this thesis presents MedBiSeNet, an efficient medical image segmentation network for real-time edge deployment. To robustly handle ambiguous and low-contrast medical boundaries, MedBiSeNet employs a boundary-enhanced bilateral path and a noise-refining feature fusion module. As a result, the proposed network achieves a Dice score of 0.9617 on polyp segmentation tasks while reducing computational complexity by over 17× compared to prior methods. At the hardware level, this thesis proposes an energy-efficient processor architecture for on-device large language models. Exploiting the characteristics of ternary-weight large language models, the proposed design reduces both linear-layer computation and self-attention memory overhead through ternary weight clustering and packing, orthogonal LSB majority-bit approximation with approximation-in-memory, and a unified processing core supporting heterogeneous workloads. The processor achieves up to 18× higher energy efficiency than prior work, enabling practical inference of billion-parameter large language models on resource-limited edge devices.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Department of Electrical Engineering

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.