International Symposium on Code Generation and Optimization, pp.94 - 104
Abstract
Stream graphs can provide a natural way to represent many applications in multimedia and DSP domains. Though the exposed parallelism of stream graphs makes it relatively easy to map them to GP (General Purpose)-GPUs, very large stream graphs as well as how to best exploit multi-GPU platforms to achieve scalable performance poses great challenges for stream graph mapping. Previous work considers either a single GPU only or is based on a crude heuristic that achieves a very low degree of workload balancing, and thus shows only limited scalability. In this paper we present a highly scalable GP-GPU mapping technique for large stream graphs with the following highlights: (1) an accurate GPU performance estimation model for subsets of stream graphs, (2) a novel partitioning heuristic exploiting stream graph's structural properties, and (3) ILP (Integer Linear Programming) formulation of the mapping problem. Our experimental results on a real GPU platform demonstrate that our technique can generate scalable performance for up to 4 GPUs with large stream graphs, and can generate highly optimized multi-GPU code especially for compute-bound ones.