Heterogeneity-aware resource management techniques for data-intensive applications

Abstract: A wide range of applications have become data-intensive as they operate on the massive amounts of data generated by social network services, multimedia devices, and Internet of Things sensors. These data-intensive applications typically require enormous computational and memory resources to extract useful information from the massive amounts of data they encounter. To accommodate the enormous computing and memory demands of data-intensive applications, hardware resources in computing systems are becoming highly heterogeneous. Specifically, numerous hardware accelerators, such as tensor processing units (TPUs) and neural processing units (NPUs), have been developed to address the ever-increasing computing demands of deep-learning applications. In addition, new memory devices, such as high-bandwidth memory (HBM) and non-volatile memory (NVM), have been developed to tackle the growing demand for increased memory performance, capacity, and cost-efficiency.

Heterogeneous computing and memory have great potential to significantly improve the performance and efficiency of data-intensive applications. However, taking full advantage of the capabilities of heterogeneous computing and memory poses significant challenges to system software in that it is the responsibility of the underlying system software to manage the heterogeneous computing and memory resources effectively so as to maximize the metric of interest, such as the performance or energy efficiency. This dissertation presents heterogeneity-aware resource management techniques that significantly improve the performance and efficiency of data-intensive applications by effectively exploiting heterogeneous computing and memory resources.

First, we investigate system software techniques that effectively schedule computations on heterogeneous computing devices for efficient deep-learning inference. To this end, we propose MOSAIC, a software-based system for heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference on heterogeneous embedded systems. MOSAIC employs accurate models for estimating the execution and communication costs of the target inference workload. MOSAIC generates an efficient model slicing and execution plan for the target workload using an algorithm based on dynamic programming.

Second, we propose HERTI, a reinforcement learning-augmented system for efficient real-time inference on heterogeneous embedded systems. HERTI efficiently explores the state space and robustly finds an efficient state that significantly improves the efficiency of the target inference workload while satisfying the corresponding deadline constraint through reinforcement learning. In addition, HERTI significantly accelerates the training process based on the accurate and lightweight cost estimators.

Third, we investigate a system software technique that effectively manages heterogeneous memory for high-performance deep-learning. We analyze the characteristics of representative deep-learning workloads on a real heterogeneous memory system. Guided by the characterization results, we propose HALO, hotness- and lifetime-aware data placement and migration for high-performance deep-learning on heterogeneous memory systems. HALO extracts the hotness and lifetime information on the tensors of the target deep-learning application based on its dataflow graph. HALO then dynamically places and migrates the tensors on heterogeneous memory nodes based on their hotness and lifetime characteristics.

Finally, we investigate a system software technique for QoS-aware and efficient workload consolidation on heterogeneous memory systems based on software-defined far memory. We conduct an in-depth characterization of the impact of cores, memory, and compressed memory swap (CMS) on the QoS and throughput of consolidated latency-critical (LC) and batch applications. Guided by the characterization results, we propose COSMOS, a software-based runtime system for the coordinated management of cores, memory, and CMS for QoS-aware and efficient workload consolidation for memory-intensive applications. COSMOS dynamically collects runtime data from consolidated applications and the underlying system and allocates the resources to the consolidated applications in a way that achieves high throughput with strong QoS guarantees.

Publisher: Ulsan National Institute of Science and Technology

Degree: Doctor

Major: Department of Computer Science and Engineering

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.