Mobile QR Code QR CODE
Title Hardware-software Co-design for Vector Similarity Search on HBM-PIM
Authors (Nahyeon Kim) ; (Sujin Kim) ; (Min Jung) ; (Haechannuri Noh) ; (Ji-Hoon Kim)
DOI https://doi.org/10.5573/JSTS.2025.25.6.662
Page pp.662-669
ISSN 1598-1657
Keywords Processing-in-memory (PIM); retrieval-augmented generation (RAG); vector similarity search; distance computation; instruction set extension; hardware-software co-design; PIM simulator
Abstract Vector similarity search is a key component of Retrieval-Augmented Generation (RAG) for large language models (LLMs), requiring memory-intensive computations such as Manhattan distance, Euclidean distance, and cosine similarity. Processing-In-Memory (PIM) architectures offer a promising solution to accelerate these memory-bound operations by reducing data movement between memory and processor. This study presents a hardware-software co-design approach for optimizing distance computation on PIM. We first implemented and evaluated a vector similarity search application on a DRAM-based PIM platform using the developed computation library, achieving 44.2% and 59.0% speed improvements for Euclidean distance and cosine similarity, respectively, compared to the CPU. However, instruction set limitations led to performance bottlenecks despite software-level optimization. To address this, we utilized an HBM-based PIM simulator and proposed two new instructions, AMC and MAN, optimized for Euclidean and Manhattan distance computations. Evaluation using a simulator integrated with DRAMSim2 showed that the proposed instructions reduced the total cycle count for distance computations by up to 44% compared to the baseline, with performance gains increasing for larger input sizes. These results demonstrate that both software-level and instruction-level optimizations are essential to fully exploit the performance potential of PIM architectures for distance computation workloads.