File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

이슬기

Lee, Seulki
Embedded Artificial Intelligence Lab.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

On-NAS: On-Device Neural Architecture Search on Memory-Constrained Intelligent Embedded Systems

Author(s)
Kim, BosungLee, Seulki
Issued Date
2023-11-14
DOI
10.1145/3625687.3625814
URI
https://scholarworks.unist.ac.kr/handle/201301/67542
Fulltext
https://sensys.acm.org/2023/program/
Citation
ACM Conference on Embedded Networked Sensor Systems
Abstract
We introduce On-NAS, a memory-efficient on-device neural architecture search (NAS) solution, that enables memory-constrained embedded devices to find the best deep model architecture and train it on the device. Based on the cell-based differentiable NAS, it drastically curtails the massive memory requirement of architecture search, one of the major bottlenecks in realizing NAS on embedded devices. On-NAS first pre-trains a basic architecture block, called meta cell, by combining 𝑛 cells into a single condensed cell via two-fold meta-learning, which can flexibly evolve to various architectures, saving the device storage space 𝑛 times. Then, the offline-learned meta cell is loaded onto the device and unfolded to perform online on-device NAS via 1) expectation-based operation and edge pair search, enabling memory-efficient partial architecture search by reducing the required memory up to π‘˜ and π‘š/4 times, respectively, given π‘˜ candidate operations andπ‘š nodes in a cell, and 2) step-by-step back-propagation that saves the memory usage of the backward pass of the 𝑛-cell architecture up to 𝑛 times. To the best of our knowledge, On-NAS is the first standalone NAS and training solution fully operable on embedded devices with limited memory. Our experiment results show that On-NAS effectively identifies optimal architectures and trains it on the device, on par with GPUbased NAS in both few-shot and full-task learning settings, e.g., even 1.3% higher accuracy on miniImageNet, while reducing the run-time memory and storage usage up to 20x and 4x, respectively.
Publisher
Association for Computing Machinery (ACM)

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.