IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, v.71, no.8, pp.3695 - 3707
Abstract
A sparsity-aware 3D-convolution neural network (3D-CNN) accelerator is proposed for the real-time mobile hand gesture recognition (HGR) system. The complex computation of 3D-convolution with the video data makes it difficult for real-time operation, especially in a resource-constrained mobile platform. To facilitate real-time implementation of HGR, this paper proposes three key features: 1) Spatio-temporal Variation Encoding and Inter-frame Differential Aware Network for highly sparse and lightweight network, reducing 94.03% parameters with only 2.57% accuracy loss on NvGesture dataset; 2) the ROI-only Computation architecture for utilizing activation sparsity to reduce the number of MAC operations and the external memory bandwidth by 84.3% and 72.3%, respectively; 3) Weight Sparsity-aware PE and Sparsity-distribution-aware Workload Allocation speed up the inference by 19.8x . As a result, the low-latency 3D-CNN accelerator utilizes both activation and weight sparsity with data mapping to maximize the reusability of 3D-CNN, achieving 31x faster inference than the state-of-the-art. The proposed processor is designed in 65 nm CMOS technology. It consumes 35 mW of power and achieves 46.25 TOPS/W of energy efficiency. As a result, the system realized 1.584 ms inference latency for real-time HGR in a mobile platform.