Mobile QR Code
Title A Low Area GEMM Accelerator Architecture for Edge Devices
Authors 전영황(Young-Hwang Jeon) ; 김희탁(Hee-Tak Kim) ; 김병수(Byung-Soo Kim) ; 황태호(Tae-Ho Hwang)
DOI https://doi.org/10.5573/ieie.2024.61.7.43
Page pp.43-50
ISSN 2287-5026
Keywords Deep neural network; Edge AI; Edge device; GEMM; Hardware accelerator
Abstract The main issue when adopting the systolic array for the GEMM accelerator is that exponentially more compute units are required for processing more data in parallel. For instance, the systolic array includes N2 number of compute units when processing N number of input data in parallel. Therefore, in this article, we propose an adder-tree based GEMM accelerator which includes totally 2N-1 number of compute units (N number of multipliers and N-1 number of adders) when processing N number of data in parallel. Accordingly, the proposed architecture reduced a lot of compute units than using the systolic array. Furthermore, we proposed not only an algorithm that reduces the external memory access by reusing data as much as possible in accelerator, but also a pipelined hardware architecture that enables high throughput performance. The proposed accelerator uses floating-point units and it has been synthesized under 40nm CMOS process, which achieved an area of 49831.59㎛² and a maximum frequency of 580MHz.