IEEE/ACM ISCA Workshop on Machine Learning for Systems 2019
Abstract
Today’s Internet must support applications with increasingly dynamic and heterogeneous connectivity requirements, such as video streaming, virtual reality (VR), and the Internet of Things. Yet current network management practices generally rely on pre-specified flow configurations, which may not be able to cope with dynamic flow priorities or changing network conditions, e.g., on volatile wireless links. In this work, we instead propose a model-free learning approach to find the optimal network policies for current flow requirements. This approach is attractive as comprehensive models do not exist for how different protocol choices affect flow performance, which may further be affected by dynamically changing network conditions. However, it raises new technical challenges: policy configurations can affect the performance of multiple flows sharing the network resources, and this flow coupling cannot be readily handled by existing online learning algorithms. In this work, we extend multi-armed bandit frameworks to propose new online learning algorithms for protocol selection with provably sublinear regret under certain conditions. We validate the algorithm performance through testbed experiments, demonstrating the superiority and scalability of our proposed algorithms.