Abstract:
In the traditional computer architecture, some applications suffer from the "memory wall" problem - workloads that exhibit dominant data movement costs and low data reuse would fully saturate the memory throughput of the system. This leaves the processor waiting on memory for the majority of the computation, thereby incurring significant loss in performance and energy efficiency. Processing-In-Memory (PIM) is an emerging technology that aims to overcome this problem by performing the computation in processors sitting very close to the memory. The massive reduction in latency and increase in total throughput opens up the possibility of accelerating such workloads. Breadth-First Search is a graph application that gets increasingly affected by the memory-wall problem as it scales, and could therefore benefit considerably by taking advantage of this architecture. In this work, we accelerate BFS workloads using UPMEM's PIM architecture Data Processing Units (DPU). We characterize the performance and scalability of BFS on DPUs and compare with the traditional CPU-based implementation. The objective of this work is to explore PIM acceleration of BFS.