Mobile QR Code
Title Improving Multi-DNN Inference Performance under Embedded GPU Environments with Overhead Minimization
Authors 임철순(Cheolsun Lim) ; 김명선(Myungsun Kim)
DOI https://doi.org/10.5573/ieie.2021.58.10.27
Page pp.27-34
ISSN 2287-5026
Keywords Deep learning; GPU; Multi-DNN execution; Embedded system
Abstract In traditional embedded systems, DNN operations were entirely dependent on server systems, but improved performance of embedded GPUs allowed the embedded system to perform DNN operations on its own. Thus, systems such as unmanned aerial vehicles, smart cities, and self-driving cars, which relied on the cloud to perform DNN operations, have the ability to run DNN models only with internal resources. Especially for self-driving cars, various data input from multiple cameras and sensors is transmitted to applications such as image recognition and lane detection. Subsequently, each application uses several kinds of DNNs to process the data. To use multiple DNNs, each DNN must be allocated to a process or a thread to run in batches, and there are many differences in overall execution time depending on the environment in which it runs. In this paper, we first analyze the overhead that occurs in each environment when multiple DNNs are executed in a multi-context environment and a single-context environment. Secondly, we analyze the memory copy problem between CPU and GPU that can occur in embedded GPU systems and the cache issue associated with it. Finally, we propose a framework that can solve the analyzed problem and minimize overhead. After applying the proposed framework to a commercial board, experiments result in up to 40.8% reduction in running time compared to the multi-context environment.