Title |
Scalable Multi-chip CNN Accelerator Design and Neural Network Partition Technique for Improving Inference Speed |
Authors |
박기태(Gitae Park) ; 박상보(Sangbo Park) ; Hayot Aliev(Hayot Aliev) ; 김형원(HyungWon Kim) |
DOI |
https://doi.org/10.5573/ieie.2024.61.10.127 |
Keywords |
Multi-chip; Convolution neural networks; Accelerator architecture |
Abstract |
As Convolution Neural Networks (CNNs) have advanced, the size of neural network models has been rapidly growing, making it increasingly challenging to map an entire CNN network onto a single CNN accelerator chip. To address this issue, this paper presents a scalable architecture of multi-chip CNN accelerator which allows to avoid the astraunomic cost of huge accelerator design and fabrication using nano-meter scale process technology. Multi-chip approach is considered an cost-efficient solution to accommodate the rapid growing size of modern neural networks. This is attributed to the observation that paritioning a large neural network in multiple accelerator chips of smaller size allows to reduce the design effort of smaller accelerator and to fabricaste the smaller chip using a less expensive process technology. This paper also presents comparative analysis of different paritioning techniques of a neural network into multiple chips, and demonstrates that an output-channel based partition is more efficient with respect to the data transfer time between the chips. To show performance improvement of proposed multi-chip architecture, we design an example multi-chip accelerator aimed for an object detector CNN model called YOLOv5n and implement it using a multi-FPGA based on two Xilinx VCU118 FPGAs. The experiment demonstrates that two-FPGA implementation achieves an improvement of inference speed by 70 % at the cost of latency decrease by 71 % compared with a single chip implementation. |