Mobile QR Code
Title Improving Cache Locality in Processing-in-Memory Accelerator for Parallel Graph Processing
Authors (Hessa Alshamsi) ; 한태희(Tae Hee Han)
DOI https://doi.org/10.5573/ieie.2021.58.10.16
Page pp.16-23
ISSN 2287-5026
Keywords Processing in memory; 3D stacked memory; Graph processing; Cache locality; Sorting algorithm
Abstract Processing-in-memory (PIM) is a promising solution to the large data movement challenge in graph processing. The computations are performed on the memory side by placing processing units near memory. However, graph algorithms suffer from poor locality due to the irregular memory access pattern. The irregular structure of graphs limits the performance of the system by requesting data from the main memory for every cache miss. In this paper, we propose a solution for improving the spatial cache locality of a PIM-based accelerator by managing its message queue. Although the accelerator uses a message-triggered prefetcher for irregular memory accesses, cache misses and main memory accesses are not avoided. We have developed a sorting algorithm for the queue to further improve cache utilization and used gem5 simulator for the implementation, with two real-world graphs on two graph algorithm benchmarks. When comparing with Tesseract[1], the results show an average increase of 11% in cache miss coverage of PIM system without increasing energy consumption.