IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Improving the Efficiency of Deep Learning Applications in a Multi-GPU Environment via Adaptive Mini-batch Data Redistribution
Authors	김인모(Inmo Kim) ; 김명선(Myungsun Kim)
DOI	https://doi.org/10.5573/ieie.2022.59.9.51
Page	pp.51-58
ISSN	2287-5026
Keywords	Fair-share scheduling; Deep learning applications; Multi-GPU; Slow-down
Abstract	More than one DNN model learning applications are usually executed inside a GPU cluster composed of several GPUs. In this environment, competition between applications to occupy GPU resources is inevitable if the total amount of GPU resources is less than that of resources required by all learning applications. Depending on the degree of competition that occurred at this time and the degree to which each learning application uses each GPU, some applications may complete learning in a short time and some must proceed for a much longer time than this. This study proposes an algorithm that solves the allocation of GPUs to learning applications in such an unfair way. Based on the current GPU resource utilization status of each application, the ratio of the predicted learning completion time and the learning completion time when each application uses GPU exclusively is calculated and defined as the current Slow-Down of the current application. The ratio of mini-batch data per GPU of each learning application is periodically adjusted and a fair allocation of GPU resources is achieved so that the Slow-Down values of currently executed learning applications become similar. As a result of confirming through various experiments, the maximum Slow-Down decreased by more than 53%, and the overall GPU utilization also increased by 25%.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.