IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, v.36, no.12, pp.1978 - 1988
Abstract
Coarse-grained reconfigurable architectures (CGRAs) can provide extremely energy-efficient acceleration for applications that are rich in arithmetic operations such as digital signal processing and multimedia applications. Since those applications are often naturally represented by stream graphs, it is very compelling to develop optimization strategies for stream graphs on CGRAs. One unique property of stream graphs is that they contain many kernels or loops, which creates both advantages and challenges when it comes to mapping them to CGRAs. This paper addresses two main problems with it, namely, many-buffer problem and control overhead problem, and presents our results of optimizing the execution of stream graphs for CGRAs including our low-cost architecture extensions. Our evaluation results demonstrate that our software and hardware optimizations can help generate highly efficient mapping of stream applications to CGRAs, with 3.4x speedup on average at the application level over CPU-only execution, which is significant.