IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Improving the Execution Speed of Transformer-based Object Tracking Models through Multi-head Attention Parallelization
Authors	김인모(Inmo Kim) ; 김명선(Myungsun Kim)
DOI	https://doi.org/10.5573/ieie.2023.60.4.39
Page	pp.39-47
ISSN	2287-5026
Keywords	Transformer; Multi-head attention; CSWinTT; Object tracking; Multi-threading
Abstract	With the recent advance of deep learning-based object tracking technology, it is being used in various application fields such as sports game analysis, video security, and augmented reality. Users require high object tracking accuracy as well as high QoS according to fast object tracking speed. In this study, we improve the object tracking speed of CSWinTT(transformer-based object-tracking model), which is currently considered as the best object tracking solution. The head operations of the Multi-Head Attention(MHA) in the encoder layer of this model occupy the most execution time in the entire inference procedure of the transformer. Each head has a different input value, but is executed in a serial manner. To overcome this, in this study, each head operation is executed in parallel. For parallel operation, the MHA consisting of one module is divided into sub-modules by the number of heads, and each separated sub-module is executed in a multi-threading environment. The pure Python environment does not guarantee a complete multi-threaded run. We thus improve to a C++ implementation environment to enable complete multi-threading. In addition, kernels transmitted asynchronously by each thread can be executed as concurrently as possible inside the GPU. As a result of checking the effect of MHA parallel execution through various experiments, the average execution time of the encoder decreased by 56.8% and the average FPS increased by 63.3% compared to the existing method while maintaining almost the same inference accuracy.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.