Mobile QR Code
Title Scalable Multi-chip CNN Accelerator Design and Neural Network Partition Technique for Improving Inference Speed
Authors 박기태(Gitae Park) ; 박상보(Sangbo Park) ; Hayot Aliev(Hayot Aliev) ; 김형원(HyungWon Kim)
DOI https://doi.org/10.5573/ieie.2024.61.10.127
Page pp.127-138
ISSN 2287-5026
Keywords Multi-chip; Convolution neural networks; Accelerator architecture
Abstract As Convolution Neural Networks (CNNs) have advanced, the size of neural network models has been rapidly growing, making it increasingly challenging to map an entire CNN network onto a single CNN accelerator chip. To address this issue, this paper presents a scalable architecture of multi-chip CNN accelerator which allows to avoid the astraunomic cost of huge accelerator design and fabrication using nano-meter scale process technology. Multi-chip approach is considered an cost-efficient solution to accommodate the rapid growing size of modern neural networks. This is attributed to the observation that paritioning a large neural network in multiple accelerator chips of smaller size allows to reduce the design effort of smaller accelerator and to fabricaste the smaller chip using a less expensive process technology. This paper also presents comparative analysis of different paritioning techniques of a neural network into multiple chips, and demonstrates that an output-channel based partition is more efficient with respect to the data transfer time between the chips. To show performance improvement of proposed multi-chip architecture, we design an example multi-chip accelerator aimed for an object detector CNN model called YOLOv5n and implement it using a multi-FPGA based on two Xilinx VCU118 FPGAs. The experiment demonstrates that two-FPGA implementation achieves an improvement of inference speed by 70 % at the cost of latency decrease by 71 % compared with a single chip implementation.