2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)
Abstract
Ensuring high availability for applications despite unpredictable cloud component failure events is a well-known problem in managing cloud infrastructure. One proposed solution uses a VM redundancy approach, reserving cloud resources for backup VMs that can substitute for primary ones in case of a failure event. However, this solution decreases the cloud resource utilization, since the backup resources usually remain idle. In this paper, we propose ECHO, a cloud resource management system that overbooks these backup VMs by optimizing the overbooking rate tradeoff between maximizing the cloud resource utilization, and thus maximizing the cloud provider’s revenue; and improving application availability, thus satisfying users. Specifically, ECHO first obtains the optimal overbooking rate required to achieve a cloud provider’s desired resource utilization level. It then computes the optimal (required) number of backup VMs that are required to maintain a given application availability level. Our extensive experimental and simulation results show that using ECHO can increase the number of accepted applications with satisfied availability by about 30%, while increasing the defined resource utilization at the same time.