Mobile QR Code
Title Efficient AI Model Scheduling for Multi-node DPU-GPU Systems
Authors 곽재석(Jaeseok Kwak) ; 노원우(Won Woo Ro)
DOI https://doi.org/10.5573/ieie.2025.62.6.13
Page pp.13-17
ISSN 2287-5026
Keywords Distributed system; Graphic processing unit (GPU); Data processing unit (DPU); Artificial intelligence (AI)
Abstract The rapid advancements in artificial intelligence (AI) technologies demand huge storage and high computational requirements, which are beyond the capabilities of traditional single CPU-GPU systems. To address this issue, distributed systems with multiple GPUs are among the commonly used approaches, while the Data Processing Unit (DPU) has been introduced to replace the CPU in handling data processing and transfer tasks. This paper proposes several techniques for optimizing a multi-node system equipped with multiple DPUs and GPUs, enabling the efficient execution of AI models. Within each node, fine-grained data transfer scheduling between the DPU and GPU minimizes the gap between data loading and computation. Across nodes, our algorithm selects the optimal parallelism strategy for distributed training, considering GPU specifications and model characteristics. Our mechanism achieves an average speedup of 1.39x compared to the baseline multi-node DPU-GPU system without optimization.