Recent advances in deep neural networks have enabled high-level perception and reasoning across domains such as medical diagnosis and natural language processing, accelerating a shift from cloud- centric computation toward on-device intelligence. However, deploying high-performance deep neural networks on edge platforms remains challenging due to strict constraints on computation, memory capacity, and energy efficiency. This thesis addresses these challenges through complementary algorithmic and hardware design approaches. At the algorithmic level, this thesis presents MedBiSeNet, an efficient medical image segmentation network for real-time edge deployment. To robustly handle ambiguous and low-contrast medical boundaries, MedBiSeNet employs a boundary-enhanced bilateral path and a noise-refining feature fusion module. As a result, the proposed network achieves a Dice score of 0.9617 on polyp segmentation tasks while reducing computational complexity by over 17× compared to prior methods. At the hardware level, this thesis proposes an energy-efficient processor architecture for on-device large language models. Exploiting the characteristics of ternary-weight large language models, the proposed design reduces both linear-layer computation and self-attention memory overhead through ternary weight clustering and packing, orthogonal LSB majority-bit approximation with approximation-in-memory, and a unified processing core supporting heterogeneous workloads. The processor achieves up to 18× higher energy efficiency than prior work, enabling practical inference of billion-parameter large language models on resource-limited edge devices.
Publisher
Ulsan National Institute of Science and Technology