Processing-in-memory (PIM) has raised as a viable solution for the memory wall crisis and has attracted great interest in accelerating computationally intensive AI applications ranging from filtering to complex neural networks. In this paper, we try to take advantage of both PIM and the residue number system (RNS) as an alternative for the conventional binary number representation to accelerate multiplication-and-accumulations (MACs), primary operations of target applications. The PIM architecture utilizes the maximum internal bandwidth of memory chips to realize a local and parallel computation to eliminates the off-chip data transfer. Moreover, RNS limits inter-digit carry propagation by performing arithmetic operations on small residues independently and in parallel. Thus, we develop a PIM-RNS, entitled PRIMS, and analyze the potential of intertwining PIM architecture with the inherent parallelism of the RNS arithmetic to delineate the opportunities and challenges. To this end, we build a comprehensive device-to-architecture evaluation framework to quantitatively study this problem considering the impact of PIM technology for a well-known three-moduli set as a case study.
Shaahin Angizi, Arman Roohi, MohammadReza Taheri, Deliang Fan