As artificial intelligence (AI) becomes increasingly integral to daily life, the demand for high- performance Deep Neural Networks (DNNs) on edge devices has surged. However, deployment is constrained by the "memory wall" in traditional von Neumann architectures, where massive data movement leads to excessive energy consumption and latency. While Computing-in-Memory (CIM) has emerged as a promising paradigm to mitigate these bottlenecks, existing embedded DRAM (eDRAM) implementations face critical challenges. These include low memory density, susceptibility to process, voltage, and temperature (PVT) variations, substantial ADC overhead, and an inability to effectively exploit data sparsity. This thesis addresses these limitations by proposing two novel analog- digital hybrid eDRAM CIM processors designed for high-density and high-energy-efficiency AI acceleration. First, leveraging the distinct advantages of Ternary Neural Networks (TNNs) in image classification, this work presents HYTEC. HYTEC features a novel transpose ternary eDRAM bitcell with a 3T1C gain-cell structure, achieving 1.58 Mb/mm² density without requiring additional in-cell logic. To resolve inherent analog MAC non-linearity, a Bitcell Gate Voltage Biasing (BGB) scheme dynamically compensates for PVT variations, reducing MAC variation by 89%. Additionally, the processor incorporates a Ternary-bit Per Cycle (TPC) SAR ADC with a shared capacitor DAC to minimize area overhead, alongside an IC-first tiled-convolution strategy. Fabricated in 28 nm CMOS, HYTEC achieves macro and system energy efficiencies of 478 TOPS/W and 273.48 TOPS/W, demonstrating that high density and robust analog computation can coexist. Furthermore, to extend applicability beyond the limited scope of TNNs and maximize energy efficiency for general-purpose multi-bit DNNs by leveraging unstructured sparsity, this thesis proposes SERAH-CIM. This processor introduces a novel Input Activation Grouping Convolution (IGC) scheme that reorders computation sequences to activate only effective rows, skipping zero-weight computations and improving the effective computation ratio by 4.59×. Supporting this is a Hybrid Reversed-MAC Macro (HRMM) capable of dynamically switching between analog and digital modes, coupled with a SAR-Flash ADC. At the system level, Sparsity-aware Proactive Scheduling (SPS) and Multi-Row- Multi-Task (MRMT) control resolve workload imbalances and enable concurrent refresh/update. Consequently, SERAH-CIM achieves a 10.37× improvement in energy efficiency for VGGNet-16 compared to state-of-the-art processors, successfully bridging the gap between theoretical CIM advantages and practical, sparsity-aware hardware implementation.
Publisher
Ulsan National Institute of Science and Technology