Title |
Design of Lightweight Fully-Connected Network in Hardware Using Learning-Based Low-Rank Approximation and Quantization Techniques |
Authors |
서정윤(Jeong-Yun Seo) ; 이종윤(Jong-Youn Lee) ; 박성준(Sung-Jun Park) ; 이하림(Harim Lee) |
DOI |
https://doi.org/10.5370/KIEE.2025.74.1.149 |
Keywords |
Deep learning; Quantization; Low-rank approximation; Pruning; Verilog HDL |
Abstract |
In this paper, we address the design of an AI hardware accelerator optimized for a lightweight fully-connected network. Techniques such as quantization, knowledge distillation, pruning, and low-rank approximation are utilized to reduce the number of weights, maintaining inference performance while minimizing memory requirements. We introduce a learning-based low-rank approximation that outperforms the original low rank approximation. In addition, the interrelationship between various compression techniques such as quantization, knowledge distillation, pruning, and low-rank approximation is analyzed to enhance the understanding of deep learning model compression. In order to use the decomposed weight matrices in hardware, we design a compressed fully-connected layer, utilized to construct a lightweight fully-connected network. The proposed hardware design is developed by using Verilog HDL and verified through RTL simulation. |