IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, v.71, no.5, pp.2093 - 2104
Abstract
An energy-efficient, unified convolutional neural network (CNN) accelerator is proposed with a lightweight RGB-D network to achieve real-time, multi-object semantic segmentation in autonomous electric vehicle system. First, a lightweight Depth-fused Trilateral Network (DTN) is proposed to achieve high accuracy and real-time operation for road and multi-object segmentation at the same time. Optimized with various types of convolution layers and limited hardware resources, the DTN achieves 94.73% accuracy on KITTI Road dataset. Second, the unified CNN processor is designed with dual-mode shift-register-based input reconfiguration units and layer fusion architecture with 2-types of processing elements for depth-wise separable convolution (DSC) to support 5 different types of convolution layers including standard convolution, dilated convolution, transposed convolution, point-wise convolution, and DSC. With flexible architecture, it achieves 17.97 × higher throughput with DTN and DSC layer fusion architecture reduces 34.7% of overall external memory access. Implemented with 28nm CMOS technology, the unified CNN processor shows 43.6 mW power consumption and 4.94 TOPS/W energy efficiency. As a result, the proposed system with DTN realizes 40.07 frames-per-second (fps) throughputs in multi-object semantic segmentation application with high resolution driving scenes dataset.