Mobile QR Code
Title Study of Remote Page Table Entry Caching to Reduce Page Walk Latency in Multi-chip GPUs
Authors 편집부(Editor)
DOI https://doi.org/10.5573/ieie.2025.62.7.27
Page pp.27-33
ISSN 2287-5026
Keywords Multi-chip-module GPU; Virtual memory; Address translation; Page table walk; Cache organization
Abstract Due to the slowdown in transistor scaling, multi-chip-module (MCM) GPUs have emerged as a promising solution to enhance the computational power of monolithic GPUs. However, the performance of MCM GPUs is often constrained by slow off-chip interconnects, which bottleneck data transfers between chiplets. In this paper, we propose a technique that caches remote page table entries (PTEs) in the local L2 cache of each chiplet to mitigate the frequent remote memory accesses during page table walks. The proposed solution is implemented in two ways: 1) caching remote PTEs only in the local L2 cache, and 2) caching remote PTEs in both the local and remote L2 caches. Our evaluation shows that both techniques reduce page walk latency by over 51.8%, achieving up to a 1.7x speedup relative to the baseline. The results indicate that remote page walks have significant impact on address translation efficiency in MCM GPUs and demonstrate that an effective PTE caching strategy can substantially enhance overall performance.