| Title |
HLS-based Integer-only Lightweight YOLOv3-Tiny FPGA Accelerator Design |
| Authors |
김종현(Jonghyun Kim) ; 강진구(Jin-Ku Kang) ; 김용우(Yongwoo Kim) |
| DOI |
https://doi.org/10.5573/ieie.2026.63.3.75 |
| Keywords |
CNN; Object detection; YOLOv3-Tiny; FPGA implementation; HLS(High-Level Synthesis) |
| Abstract |
In computer vision, CNN-based object detectors are widely used because of their strong accuracy. However, achieving high accuracy typically requires many parameters and heavy computation, which complicates deployment in embedded environments with tight resource budgets. To address this issues, active research has focused on both model optimization and hardware architecture design. On the model side, model optimizaton modifies the original architecture to improve accuracy, some optimized models still have more parameters than existing lightweight baselines, making them difficult ot deply in memory constrained environments. On the hardware side, some accelerators attempt to load all parameters into on-chip memory for fast computation, while others rely on data tiling to better utilize hardware resources. The former can easily exceed the available on-chip resources, and the latter incurs frequent DRAM accesses increasing latency and power consumption. In this paper, We propose a model architecture that reconstructs the network with a new computational block combining standard and depthwise convolutions, reducing model size while preserving accuracy. We also propose a hardware architecture that, based on layer characteristics, chooses between Linebuffer-based computation and Planebuffer-based computation to balance on-chip memory usage and DRAM traffic. Applied to YOLOv3-Tiny, the proposed model optimization reduces parameters by 80.8% while achieving 62.1% accuracy. Implemented on a Xilinx ZC706 FPGA board, the proposed accelerator uses 314 BRAM18K, 306 DSPs, 25.9k FFs, and 17.9k LUTs. |