Deep convolutional neural network (CNN) inference on mobile devices is often desirable for user experience and privacy. Running inference of computation and memory-intensive CNNs on mobile devices is challenging as they are resource-limited and battery prolonging is important for them. Because of CNN’s layer-wise properties, each block constituting a CNN can have different workload characteristics in same CNN model. To fully utilize DVFS for increasing energy efficiency of mobile devices, it is important to understand the workload characteristics of each block in a model. In our preliminary study, we can observe that energy-delay trade-off of each block in MobileNet-V3 differs according to the core frequency and it is related to memory access rate during an inference. In this paper, we observe that memory access rate is affected by the structure of a block and the cache size of the device and scaling core frequency and memory bandwidth differently by memory access rate can increase energy efficiency. By taking into account our novel findings, we propose NeuroValve, a fine-grained CPU clock frequency and memory bandwidth control system for running CNN on mobile devices. We implement NeuroValve on the Pixel3a and Pixel3 perform extensive evaluations over various state-of-the-art CNN models. Our results confirm that NeuroValve can conserve energy while offering slight delay increase compared to Android default settings.
Publisher
Ulsan National Institute of Science and Technology (UNIST)