This thesis investigates the partitioning algorithms and task scheduling policies for pipelined model parallelism (PMP) execution on heterogeneous GPU clusters. We study the diverse model’s behavior in the PMP environment, and explore the differences of partitioning algorithms and task scheduling policies for effective execution. We suggest the partitioning algorithm to find an efficient model partition for PMP execution on heterogenous GPU clusters. Partitioning algorithm considering NasNet's multi-connected layers and GNMT's PMP-friendly structure, and troubleshoot running multiple minibatches on PMP VW are contained here. We search the effect of scheduling policy for a previously decided partition and an effective number of minibatches, and suggest a reference for choosing scheduling policies in those execution circumstances. We evaluate each of the partitioning algorithms and task scheduling policies then show that they can affect performance in PMP execution.
Publisher
Ulsan National Institute of Science and Technology (UNIST)