Title |
Integrated Hardware-software System for Accelerating Transformer Inference |
Authors |
이지영(Ji-Young Lee) ; 이소혜(So-Hye Lee) ; 오선희(Seon-Hee Oh) ; 김태환(Tae-Hwan Kim) |
DOI |
https://doi.org/10.5573/ieie.2024.61.7.51 |
Keywords |
Inference system; FPGA; Transformer; Hardware-software integrated system |
Abstract |
This paper presents an integrated hardware-software system for accelerating transformer inference. We have profiled execution time for each operation in the transformer inference implemented in pure software to identify the bottleneck. A dedicated hardware unit has been implemented to accelerate matrix multiplication. The unit has been designed to reduce data transfer with an operand sharing technique. In addition, the matrix multiplication process has been implemented based on Strassen algorithm to reduce the computational complexity. The inference speed in the proposed system is 12.27 times higher than that in the software-based inference system, maintaining the BLEU score for the Multi30k translation task. |