Processing-in-Memory (PIM) has re-emerged as a promising solution to alleviate the memory bottleneck in data-intensive workloads such as machine learning and scientific computing. However, most existing DRAM-based PIM architectures are limited to a few domain-specific kernels such as GEMV, while general-purpose designs suffer from large hardware complexity. In this paper, we propose a programmable near-bank PIM architecture capable of efficiently sup- porting diverse kernels, including both regular and irregular workloads. We design Decoupled Ac- cess–Execute (DAE) architecture that enables dependency-aware instruction scheduling and host-independent execution, allowing dynamic handling of variable memory latency. Based on the DAE architecture, we design a lightweight SIMD PIM ISA that supports indirect addressing, predication, and permutation operations to effectively map irregular kernels without additional on-chip memory structures. We implement and evaluate the proposed architecture using the kernels of PrIM benchmark, cov- ering different characteristics. Experimental results show that the proposed architecture efficiently map the range of kernels supported by DRAM-based PIM while maintaining moderate hardware overhead compared to bank area. These results demonstrate the feasibility of a PIM platform that balances pro- grammability and efficiency for memory-bound workloads.
Publisher
Ulsan National Institute of Science and Technology