Design Automation and Test in Europe Conference, pp.1575 - 1578
Abstract
While Coarse-Grained Reconfigurable Architectures (CGRAs) are very efficient at handling regular, compute-intensive loops, their weakness at control-intensive processing and the need for frequent reconfiguration require another processor, for which usually a main processor is used. To minimize the overhead arising in such collaborative execution, we integrate a dedicated sequential processor (SP) with a reconfigurable array (RA), where the crucial problem is how to share the memory between SP and RA while keeping the SP's memory access latency very short. We present a detailed architecture, control, and program example of our approach, focusing on our optimized on-chip shared memory organization between SP and RA. Our preliminary results demonstrate that our optimized memory architecture is very effective in reducing keruel execution times (23.5% compared to a more straightforward alteruative), and our approach can reduce the RA control overhead and other sequential code execution time in kernels significantly, resulting in up to 23.1 % reduction in kernel execution time, compared to the conventional system using the main processor for sequential code execution.