IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, v.41, no.10, pp.3387 - 3399
Abstract
DNN (Deep Neural Network) processing units, or DPUs, are one of the most energy-efficient platforms for DNN applications. However, designing new DPUs for every DNN model is very costly and time-consuming. In this paper we propose an alternative approach: to specialize coarse-grained reconfigurable architectures (CGRAs), which are already quite capable of delivering high performance and high energy efficiency for compute-intensive kernels. We identify a small set of architectural features on a baseline CGRA to enable high performance mapping of depthwise convolution (DWC) and pointwise convolution (PWC) kernels, which are the most important building block in recent light-weight DNN models. Our experimental results using MobileNets demonstrate that our proposed CGRA enhancement can deliver 8 18× improvement in area-delay product depending on layer type, over a baseline CGRA with a state-of-the-art CGRA compiler. Moreover, our proposed CGRA architecture can also speed up 3D convolution with similar efficiency as previous work, demonstrating the effectiveness of our architectural features beyond depthwise separable convolution layers.