Mobile QR Code
Title Integrated Hardware-software System for Accelerating Transformer Inference
Authors 이지영(Ji-Young Lee) ; 이소혜(So-Hye Lee) ; 오선희(Seon-Hee Oh) ; 김태환(Tae-Hwan Kim)
DOI https://doi.org/10.5573/ieie.2024.61.7.51
Page pp.51-59
ISSN 2287-5026
Keywords Inference system; FPGA; Transformer; Hardware-software integrated system
Abstract This paper presents an integrated hardware-software system for accelerating transformer inference. We have profiled execution time for each operation in the transformer inference implemented in pure software to identify the bottleneck. A dedicated hardware unit has been implemented to accelerate matrix multiplication. The unit has been designed to reduce data transfer with an operand sharing technique. In addition, the matrix multiplication process has been implemented based on Strassen algorithm to reduce the computational complexity. The inference speed in the proposed system is 12.27 times higher than that in the software-based inference system, maintaining the BLEU score for the Multi30k translation task.