Design of an Approximate 4-2 Compressor with Error Recovery for Efficient Approximate
Multiplication
HwangSungyoun1
SeokHyelin1
KimYongtae†
-
(The School of Computer Science and Engineering, Kyungpook National University, Daegu
41566, Korea )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Approximate computing, approximate multiplier, approximate compressor, error recovery, energy efficiency
I. INTRODUCTION
The trend toward device miniaturization and the prevalence of battery-powered devices
are becoming increasingly evident in the present day [1]. Within these devices, the execution of applications demanding complex data processing
is progressively increasing. This has led to a surge in power and energy consumption
for the devices and prompted researchers to explore innovative ways to reduce hardware
resource consumption. In such contexts, deliberately sacrificing accuracy to reduce
energy consumption is an appealing technique. In this regard, approximate computing
emerges as a promising solution for balancing energy efficiency and precision [2]. Particularly, multimedia processing, which includes tasks such as image and video
processing, audio compression, and speech recognition, is a representative application
where approximate computing can be readily applied [3-11]. These tasks primarily involve the processing of visual and auditory information,
making them less sensitive to computational errors, given the inherent limitations
of human sensory perception, which often fails to detect minor discrepancies or distortions
[12]. Hence, by tactically sacrificing precision, the power and energy efficiency of multimedia
devices can be significantly enhanced without any noticeable loss in processing quality.
While arithmetic operations serve as the foundation for numerous computational tasks,
multiplication stands out as one of the most fundamental and frequently utilized arithmetic
operations [13]. Multipliers in digital systems often demand extensive computational resources, leading
to substantial power consumption and latency. To address these challenges, researchers
have explored the design of energy-efficient approximate multipliers tailored for
error-tolerant applications [14-17]. By intentionally sacrificing computational accuracy, these multipliers aim to improve
hardware efficiency. Allowing acceptable levels of error in their output, approximate
multipliers offer the potential to reduce power consumption and enhance throughput.
One of the highly effective design methodologies for constructing an approximate multiplier
involves the utilization of approximate compressors [18-23]. The multiplication process based on the compressors encompasses three stages: 1)
partial product generation; 2) partial product reduction; and 3) summation of the
final two partial product terms. In this approach, the utilization of approximate
compressors in the second stage plays a crucial role, significantly affecting the
overall performance of the multiplier. In this regard, to achieve a balance between
the accuracy and hardware resource consumption of approximate multipliers, a considerable
number of approximate 4-2 compressors have been introduced in the literature [24-27].
Momeni et al. introduced a pair of approximate 4-2 compressors, one of which eliminated
the input bit C$_{in}$ and output bit C$_{out}$ of an exact 4-2 compressor, resulting
in a significant reduction in hardware resource consumption [24]. This design approach, characterized by its simplicity and excellent performance,
had a notable impact on subsequent compressors. Akbari et al. proposed a set of four
dual-quality approximate 4-2 compressor designs [25]. These compressors were integrated into their proposed multiplier architecture, which
facilitates switching between accurate and approximate operational modes to effectively
adjust approximate multiplication accuracy. Additionally, Venkatachalam et al. presented
a novel approach featuring approximate versions of a half adder, a full adder, and
a 4-2 compressor to realize an approximate multiplier considering area, power, and
accuracy [26]. Furthermore, Pei et al. introduced approximate multipliers by proposing three new
cost-effective approximate compressors and an error compensation module aimed at reducing
the overall error distance in its output under specific input conditions [27].
In this paper, we present novel approximate multiplier designs based on an efficient
4-2 compressor, demonstrating substantial improvements in compu-tational accuracy
with minimal hardware overhead. Our design approach involves a systematic analysis
of the conventional approximate compressor, aiming to significantly enhance its performance
through the introduction of an error recovery logic. When implemented in a 32-nm CMOS
technology, the proposed multiplier achieves considerable reductions in area, power,
and energy by up to 25.2%, 22.9%, and 24.0%, respectively, compared to the approximate
multipliers considered in this paper. The main contributions of this paper are summarized
as follows:
• We systematically analyze an existing 4-2 compressor to mitigate its error distance
and propose a novel error recovery logic that markedly enhances overall accuracy.
• Two distinct approximate multiplier designs are presented, leveraging different
configurations of the proposed compressors, and their performance is shown.
• To validate the proposed multiplier’s efficacy, we integrate it into an image processing
application, conducting a comparative analysis with outcomes of the others.
II. PROPOSED APPROXIMATE COMPRESSOR AND MULTIPLIER DESIGNS
In this section, we provide a brief overview of the conventional 4-2 Momeni compressor,
delineating its characteristics and limitations. Then, we propose a novel approximate
compressor designed to overcome the shortcomings of this traditional counterpart.
Our compressor integrates a cost-effective error recovery logic to rectify errors
that occur under a specific input condition, thereby enhancing overall accuracy. Furthermore,
we present architectures for approximate multipliers based on the proposed compressor.
1. Proposed Approximate 4-2 Compressor with Error Recovery Logic
The Momeni compressor is an approximate 4-2 compressor that simplifies its design
by eliminating one input C$_{in}$ signal and one output C$_{out}$ signal of an exact
4-2 compressor [24]. This simplification leads to reduced logic complexity and lowers hardware costs,
requiring only three NOR gates for Carry generation and two XNOR gates, along with
one OR gate, for Sum computation. Additionally, Seok et al. have further optimized
the Momeni compressor, enhancing hardware efficiency by leveraging compound gates,
known for their cost-effectiveness compared to regular gate combinations [28]. To achieve this, De Morgan’s law was employed to minimize the expressions of the
Carry and Sum signals, leading to the Seok compressor necessitating only two compound
gates along with a NOT gate (i.e., inverter). Although the Momeni and Seok compressors
provide a hardware-efficient implemen-tation of approximate multipliers, they suffer
from a significant drawback related to poor accuracy. Table 1 presents the truth table for the Momeni compressor, considering all possible input
combinations. This compressor exhibits output errors when the input X$_{\mathrm{4\colon
1}}$ is either 0000, 0011, 1100, or 1111. The compressor's inputs are derived from
the partial products of the two multiplier operands, and due to the nature of partial
products involving AND operations, each bit's probability of being 0 or 1 is 3/4 or
1/4, respectively. This implies that the likelihood of each input case for the compressor,
under uniformly distributed input patterns of the multiplier, depends on the count
of zeros and ones. For instance, considering the input pattern X$_{\mathrm{4\colon
1}}$ = 0101, the probability of occurrence is calculated as 3/4 ${\times}$ 1/4 ${\times}$
3/4 ${\times}$ 1/4 = 9/256, equivalent to 3.5%, as shown in Table 1. Certainly, in this context, our primary observation is that the Momeni compressor
introduces an error in its Sum signal in the most likely input pattern X$_{\mathrm{4\colon
1}}$=0000, with an occurrence probability of 81/256, representing 31.6%, thereby substantially
degrading the overall accuracy of approximate multiplications.
Table 1. The truth table of the Momeni compressor
X4
|
X3
|
X2
|
X1
|
Carry
|
Sum
|
Error Distance
|
Probability
|
0
|
0
|
0
|
0
|
0
|
1
|
+1
|
81/256
|
0
|
0
|
0
|
1
|
0
|
1
|
0
|
27/256
|
0
|
0
|
1
|
0
|
0
|
1
|
0
|
27/256
|
0
|
0
|
1
|
1
|
0
|
1
|
-1
|
9/256
|
0
|
1
|
0
|
0
|
0
|
1
|
0
|
27/256
|
0
|
1
|
0
|
1
|
1
|
0
|
0
|
9/256
|
0
|
1
|
1
|
0
|
1
|
0
|
0
|
9/256
|
0
|
1
|
1
|
1
|
1
|
1
|
0
|
3/256
|
1
|
0
|
0
|
0
|
0
|
1
|
0
|
27/256
|
1
|
0
|
0
|
1
|
1
|
0
|
0
|
9/256
|
1
|
0
|
1
|
0
|
1
|
0
|
0
|
9/256
|
1
|
0
|
1
|
1
|
1
|
1
|
0
|
3/256
|
1
|
1
|
0
|
0
|
0
|
1
|
-1
|
9/256
|
1
|
1
|
0
|
1
|
1
|
1
|
0
|
3/256
|
1
|
1
|
1
|
0
|
1
|
1
|
0
|
3/256
|
1
|
1
|
1
|
1
|
1
|
1
|
-1
|
1/256
|
In this regard, to improve the accuracy by rectifying the output for the input pattern
X$_{\mathrm{4\colon 1}}$=0000, we introduce a specialized error recovery logic designed
for this compressor. The proposed recovery logic ensures that the compressor output
Sum is adjusted to 0 from 1 when the input pattern X$_{\mathrm{4\colon 1}}$ equals
0000, correcting the Sum signal while keeping the Carry signal unchanged. Therefore,
the truth table of the proposed compressor is identical to that of Momeni, except
for the input X$_{\mathrm{4\colon 1}}$ = 0000. The proposed compressor is characterized
by the following equations:
To realize these equations in digital logic, the proposed compressor design demands
a few additional logic gates compared to the conventional counterpart. Fig. 1 illustrates the implementation of the proposed compressor design. In contrast to
the traditional compressor, which employs two compound gates (AO221 and OA22) along
with an inverter, the proposed design integrated an additional compound gate (OA21)
and a three-input OR gate for the error recovery, as depicted in Fig. 1. Additionally, it is noteworthy that our error recovery logic can also be realized
using a four-input OR gate and a two-input AND gate. In the following sections, we
particularly used the former design of the two error recovery logic configurations.
Consequently, the proposed compressor yields identical Sum and Carry outputs to the
conventional design for all input patterns as listed in Table 1, except in the case of X$_{\mathrm{4\colon 1}}$=0000. In this specific case, the
proposed compressor generates the accurate Sum and Carry values, both being 0, thereby
enhancing the overall accuracy.
Fig. 1. Proposed approximate 4-2 compressor.
2. Proposed Approximate Multiplier Architecture
To build an N${\times}$N approximate multiplier using compressors, two primary schemes
for partial product reduction, namely the C-N and C-FULL configurations, are commonly
employed. These reduction schemes aim to reduce the partial product matrix height
to two rows, which are then summed up by an adder to produce the final multiplication
output. The C-N configuration incorporates approximate 4-2 compressors solely in the
N least significant columns of the partial product matrix to minimize errors. On the
other hand, the C-FULL configuration utilizes compressors across all columns in the
matrix to optimize hardware efficiency at the expense of reduced accuracy.
Fig. 2 shows the proposed architecture of approximate multipliers designed for 8${\times}$8
multiplications, employing two distinctive partial product reduction schemes. In the
C-N configuration, illustrated in Fig. 2(a), the higher-order N-1 columns employ exact 4-2 compressors, full adders, and half
adders to yield accurate N most significant bit (MSB) multiplication outputs. Simultaneously,
the lower-order N columns utilize approximate alternatives along with half adders
to produce N least significant bit (LSB) approximate outcomes. Importantly, the proposed
compressors are exclusively deployed in the most significant column of the N least
significant columns of the partial product matrix while the remaining columns adhere
to the conventional compressor designs. This proposed configuration enhances overall
accuracy while maintaining hardware efficiency. In the C-FULL configuration, as depicted
in Fig. 2(b), we replace the traditional compressors with the proposed alternatives in the N-1
most significant columns to enhance accuracy while the traditional compressors remain
unchanged in the N least significant columns. Consequently, in the proposed multiplier
architectures, three and eight conventional compressors are replaced with the proposed
compressors equipped with error recovery logic in the C-N and C-FULL configurations,
respectively. It is noteworthy that our multiplier architecture is rather scalable
and can be readily applied to wider multiplications beyond the 8${\times}$8 scale.
Fig. 2. Proposed approximate multiplier architecture using two partial product reduction schemes for 8×8 multiplication: (a) C-N configuration using the proposed compressor only in the most significant columns of the N least significant columns in the partial product matrix; (b) C-FULL configuration using the proposed compressor in the N-1 most significant columns.
III. EXPERIMENTAL RESULTS
In this section, we conduct a comprehensive assessment of the performance of our proposed
multipliers, incorporating the proposed compressor with error recovery logic, with
a focus on both hardware efficiency and computation accuracy. We undertake a comparative
analysis of the proposed multiplier against existing counterparts to illustrate the
superiority of our design. The comparative counterparts encompass five other cost-effective
approximate 4-2 compressor-based multipliers, namely Momeni [24], Akbar1, 2 [25], Venka [26] and Pei [27].
To assess and compare hardware performance, our 8${\times}$8 multiplier, along with
the other five same-sized multiplier designs, was designed in Verilog HDL and synthesized
using Synopsys Design Compiler in a 32-nm CMOS technology. Furthermore, the accuracy
performance was evaluated through software-based simulations for all the multiplier
designs.
1. Computation Accuracy Performance Analysis
The normalized mean error distance (NMED) and mean relative error distance (MRED)
serve as standard error metrics for evaluating the accuracy performance of approximate
multipliers. These metrics were calculated for all considered multipliers across the
entire range of possible input combinations, as defined by the following equations:
where n represents the number of bits of the multiplier and the error distance is
defined as ED$_{i}$={\textbarM}$_{i}$-M’$_{i}${\textbar}, in which M$_{i}$ and M’$_{i}$
correspond to the exact output and the approximate output for the (i)$^{th}$ input
data, respectively [23].
Fig. 3 shows the NMED and MRED values of various multipliers under both the C-N and C-FULL
configurations. Under the C-N configuration, our proposed multiplier demonstrates
superior performance in NMED compared to the other five approximate alternatives while
the Pei multiplier exhibits the worst NMED performance.
Fig. 3. Accuracy performance comparison: (a) NMED of multipliers with C-N configuration; (b) MRED of multipliers with C-N configuration; (c) NMED of multipliers with C-FULL configuration; (d) MRED of multipliers with C-FULL configuration.
Moreover, the MRED of the proposed multiplier is comparable to that of the Akbar1
multiplier, falling between the values for Momeni/Pei and Akbar2/Venka multipliers.
Specifically, when compared to the Pei multiplier, which exhibits the worst NMED and
the second-worst MRED performance, our proposed multiplier achieves a remarkable reduction
of 79.94% and 49.27% in NMED and MRED, respectively. Moreover, it achieves reductions
of 23.84% in NMED and 49.94% in MRED when compared with the Momeni multiplier. A similar
trend is found in the multipliers under the C-FULL configuration, with our proposed
multiplier surpassing the other counterparts in NMED performance, while the Pei multiplier
exhibits the least favorable performance. Additionally, our multiplier aligns with
those of the Akbar2 and Venka multipliers, demonstrating the best MRED performance,
while the Momeni multiplier exhibits the least favorable MRED value. Remarkably, the
proposed multiplier achieves notable reductions of up to 89.81% and 97.10% in NMED
and MRED, respectively, compared to the other five counterparts considered in this
paper.
Overall, the observed trend indicates that the proposed multiplier performs notably
better under the C-FULL configuration than the C-N alternative. As depicted in Fig. 2, while the NMED of our multiplier is comparable to that of the Akbar2 and Venka multipliers
under the C-N configuration, it exhibits a substantial reduction of 51.40% and 48.26%
in NMED, respectively, when compared to those under the C-FULL configuration. In summary,
the proposed error recovery logic effectively enhances the accuracy performance of
the multiplier by rectifying output errors from the compressor.
2. Hardware Performance Analysis
Table 2 shows the hardware performance of the exact and approximate compressors. The hardware
performance metric includes area, power, critical path delay, and PDP. The proposed
compressor outperforms the exact compressor in each metric by 59.50%, 73.13%, and
52.50%, respectively. Additionally, the PDP achieved a reduction of up to 87.39%.
When compared to the Momeni compressor, the compound gate decreases area and power
by 34.75% and 39.22%, even with error recovery logic incorporated.
Table 2. Hardware performance of compressors
Compressor
|
area
(µm2)
|
power
(µW)
|
delay
(ns)
|
PDP
(fJ)
|
Exact
|
29.48
|
8.72
|
0.40
|
3.49
|
Momeni [24]
|
18.30
|
3.85
|
0.14
|
0.54
|
Akbar1 [25]
|
12.20
|
2.58
|
0.12
|
0.31
|
Akbar2 [25]
|
14.74
|
2.27
|
0.12
|
0.27
|
Venka [26]
|
19.31
|
3.88
|
0.15
|
0.58
|
Pei [27]
|
14.23
|
3.19
|
0.18
|
0.57
|
Proposed
|
11.94
|
2.34
|
0.19
|
0.44
|
Table 3 provides an overview of the hardware performance of the approximate multipliers,
considering the aforementioned four hardware metrics. It is worth noting that the
proposed multiplier adopts the proposed C-N and C-FULL configurations, as shown in
Fig. 2, while the existing multipliers employ the conventional C-N and C-FULL alternatives
in [17]. In the C-N configuration, our proposed multiplier demonstrates the most efficient
performance in terms of area and the second-best performance in power and PDP. On
the contrary, the Momeni multiplier consistently performs least favorably across all
hardware metrics. This result indicates that the compound gates are more hardware-efficient
than the regular logic gate combinations [28]. Specifically, our multiplier achieves notable reductions of up to 12.72%, 10.78%,
and 10.78% in area, power, and PDP, respectively, compared to the other five designs.
Under the C-FULL configuration, a similar trend persists, with our multiplier occupying
the smallest area and showcasing the second-best perfor-mance in power and PDP. The
Venka multiplier exhibits the least favorable performance in terms of area and PDP,
while the Momeni multiplier consumes the highest power. Notably, the Pei multiplier
excels in power and PDP, demonstrating the most efficient performance in these aspects.
Table 3. Hardware performance of approximate multipliers under C-N and C-FULL configurations
Compressor
|
C-N configuration
|
C-FULL configuration
|
area
(µm2)
|
power
(µW)
|
delay
(ns)
|
PDP
(fJ)
|
FOM
|
area
(µm2)
|
power
(µW)
|
delay
(ns)
|
PDP
(fJ)
|
FOM
|
Momeni [24]
|
752.01
|
203.99
|
1.16
|
236.63
|
337.37
|
662.55
|
160.42
|
1.02
|
163.63
|
5348.44
|
Akbar1 [25]
|
697.12
|
192.97
|
1.16
|
223.85
|
531.81
|
558.86
|
138.37
|
1.01
|
139.75
|
3107.43
|
Akbar2 [25]
|
719.99
|
191.00
|
1.16
|
221.56
|
239.41
|
602.07
|
138.93
|
1.02
|
141.71
|
1867.74
|
Venka [26]
|
761.16
|
202.37
|
1.16
|
234.75
|
259.78
|
679.84
|
160.01
|
1.03
|
164.81
|
2330.63
|
Pei [27]
|
715.42
|
176.27
|
1.15
|
202.71
|
1034.52
|
593.43
|
112.04
|
1.07
|
119.88
|
7769.28
|
Proposed
|
664.33
|
182.00
|
1.16
|
211.12
|
202.29
|
508.80
|
123.75
|
1.02
|
126.23
|
678.88
|
Furthermore, for a comprehensive evaluation of the accuracy and energy efficiency
tradeoff in the approximate multipliers, we utilize a figure of merit (FOM) that takes
into account energy, area, delay, and NMED, as defined in [29]. The FOM is defined by:
The FOM values are presented in Table 3, where a lower FOM value indicates a more favorable tradeoff performance. Notably,
upon analysis, our proposed multiplier stands out as the most efficient design when
considering both accuracy and hardware aspects. In contrast, the Pei multiplier demonstrates
the least favorable tradeoff performance under both configurations. Specifically,
the proposed multiplier exhibits an 80.52% and 91.26% reduction in FOM factor under
the C-N and C-FULL configurations, respectively, compared to the Pei alternative.
Overall, these results underscore the well-balanced nature of the proposed multiplier
in achieving accuracy while maintaining hardware efficiency.
IV. CASE STUDY: IMAGE PROCESSING
To evaluate the practical performance of the proposed multiplier in real-world scenarios,
we applied the proposed design alongside several existing approximate multipliers
in a digital image processing task, specifically image blending, where multiplication
plays a crucial role [23]. Utilizing the 8-bit unsigned approximate multipliers with both C-N and C-FULL configurations,
we evaluated image quality using the peak signal-to-noise ratio (PSNR) metric. Additionally,
we selected nine distinct images and specifically chose two of them for pixel-wise
image blending through multiplication.
Fig. 4 shows the blended images of two well-known benchmark images, House and Jetplane,
utilizing various approximate multipliers with the C-N configuration. The corresponding
PSNR values for each blended image are also added. Notably, the proposed multiplier
outperforms the other approximate multipliers, achieving exceptional image processing
quality with a PSNR of 53.03 dB for the blended image output. In contrast, the Akbar1
and Pei multipliers yield output images with PSNRs below 50 dB.
Fig. 4. The original images and the output images with PSNR values for image blending.
Table 4 provides the outcomes of the image blending using four different benchmark image
sets and average values. Remarkably, the proposed multiplier consistently yields the
highest PSNR values across all four scenarios, regardless of whether the C-N or C-FULL
configuration is employed. In the C-N configuration, the proposed multiplier exhibits
an average improvement of 6.90% in image quality, as measured by the PSNR metric,
compared to the other designs. Specifically, it achieves 19.29% higher PSNR compared
to the Pei alternative, which shows the lowest PSNR. Moreover, the proposed design
consistently demonstrates superior processing quality in the C-FULL configuration.
Exclusively, the proposed design generates blended images with PSNR values surpassing
36 dB, while a majority of the other counterparts result in images with PSNR values
below 30 dB. Particularly, in comparison to the Pei multiplier, our design shows an
average PSNR enhancement of 101.68%. On average, it achieves a PSNR improvement of
44.53%.
Table 4. The PSNRs for the image blending using various approximate multipliers under the C-N and C-FULL configurations
|
Moon
Cameraman
|
Einstein
Baboon
|
Barbara
Lake
|
Barbara
Lena
|
Average
|
Moon
Cameraman
|
Einstein Baboon
|
Barbara
Lake
|
Barbara
Lena
|
Average
|
Momeni [24]
|
52.28
|
52.10
|
52.08
|
52.21
|
52.17
|
24.42
|
26.76
|
25.97
|
25.85
|
25.75
|
Akbar1 [25]
|
48.28
|
48.36
|
48.55
|
48.11
|
48.33
|
29.13
|
25.03
|
27.12
|
26.90
|
27.04
|
Akbar2 [25]
|
52.05
|
52.33
|
52.17
|
52.10
|
52.16
|
32.49
|
29.39
|
30.37
|
32.60
|
31.20
|
Venka [26]
|
52.17
|
52.47
|
52.35
|
52.25
|
52.31
|
32.62
|
29.91
|
30.41
|
32.63
|
31.39
|
Pei [27]
|
45.11
|
45.05
|
44.43
|
44.73
|
44.83
|
19.62
|
18.53
|
19.23
|
19.40
|
19.19
|
Proposed
|
53.61
|
53.12
|
53.00
|
53.13
|
53.21
|
39.57
|
36.35
|
37.84
|
36.89
|
37.67
|
To holistically evaluate both image processing quality and hardware efficiency, the
prior works introduced two metrics, namely QUPD and QUAP [18,30]. The QUPD involves PSNR, power savings, and delay savings, while the QUAP considers
PSNR, area savings, and power savings, and they are mathematically defined as follows:
In both metrics, the PSNR is squared to underscore its importance in representing
image quality. Fig. 5 illustrates the QUPD and QUAP values for different multipliers under both the C-N
and C-FULL configurations. Higher values in these metrics indicate superior performance.
Impressively, the proposed multiplier consistently emerges as the best-performing
design in both QUPD and QUAP. Specifically, our design exhibits an average improvement
of 34.35% and 158.99% in QUPD under the C-N and C-FULL configurations, respectively,
compared to other counterparts. Moreover, in QUAP, the proposed multiplier demonstrates
an average enhancement of 105.96% and 265.24% under the same configurations. Particularly
noteworthy are the significant improvements of up to 46.25% and 168.2% in QUAP and
QUPD, respectively, under the C-N configuration, and up to 310.14% and 413.23%, respectively,
under the C-FULL configuration. In summary, the proposed multiplier design not only
effectively reduces hardware resource consumption but also provides outstanding image
processing quality.
Fig. 5. Performance comparison with various multipliers in QUPD and QUAP: (a) C-N configuration; (b) C-FULL configuration.
V. CONCLUSIONS
In this paper, we introduced a cost-effective approximate multiplier by proposing
a novel approximate 4-2 compressor integrated with an error recovery logic. The proposed
error recovery mechanism plays a crucial role in diminishing error distances and enhancing
the overall accuracy of approximate multiplication. Consequently, the multipliers
utilizing our proposed compressor, equipped with the error recovery logic, achieved
substantial reductions of up to 89.8% and 97.1% in NMED and MRED, respectively, compared
to other existing approximate multipliers examined in this paper. Additionally, our
multiplier design demonstrated reductions of up to 25.2%, 22.9%, and 24.0% in area,
power, and PDP, respectively, when contrasted with the alternative designs. Moreover,
our proposed design exhibited superior processing quality in a digital image blending
application compared to other alternatives, all while efficiently reducing hardware
resource consumption.
ACKNOWLEDGMENTS
This work was supported in part by the BK21 FOUR project (AI-driven Convergence Software
Education Research Program) funded by the Ministry of Education, School of Computer
Science and Engineering, Kyungpook National University, Korea (41202420214871) and
in part by the National Research Foundation of Korea (NRF) grant funded by the Korea
government (MSIT) (RS-2023-00279770).
References
J. H. Kim, C. Kim, K. Kim, J. Lee. H.-J. Yoo, and J.-Y. Kim, “An Ultra-Low-Power Mixed-Mode
Face Recognition Processor for Always-on User Authentication in Mobile Devices,” IEIE
Journal of Semiconductor Technology and Science, Vol. 20, No. 6, pp. 499-509, Dec.
2020.
T. Moreau, A. Sampson, and L. Ceze, “Approximate Computing: Making Mobile Systems
More Efficient,” IEEE Pervasive Computing, Vol. 14, No. 2, pp. 9-13, Apr.-Jun. 2015.
H. Seo, H. Seok, J. Lee, Y. Han, and Y. Kim, “Design of an Approximate Adder based
on Modified Full Adder and Nonzero Truncation for Machine Learning,” IEIE Journal
of Semiconductor Technology and Science, Vol. 23, No. 2, pp. 138-148, Apr. 2023.
S. Kim and Y. Kim, "Novel XNOR-Based Approximate Computing for Energy-Efficient Image
Processors," IEIE Journal of Semiconductor Technology and Science, vol. 18, no. 5,
pp. 602-608, Oct. 2018.
J. Lee, H. Seo, H. Seok, and Y. Kim, “A Novel Approximate Adder Design using Error
Reduced Carry Prediction and Constant Truncation,” IEEE Access, Vol. 9, pp. 119939-119953,
Aug. 2021.
G. Park, J. Kung, and Y. Lee, “Design and Analysis of Approximate Compressors for
Balanced Error Accumulation in MAC Operator,” IEEE Transactions on Circuits and Systems
I: Regular Papers, vol. 68, no. 7, pp. 2950-2961, July 2021.
H. Seok, H. Seo, J. Lee, and Y. Kim, “COREA: Delay- and Energy-Efficient Approximate
Adder Using Effective Carry Speculation,” Electronics, Vol. 10, No. 18, pp. 2234:1-2234:12,
Sept. 2023.
W. Choi, M. Shim, H. Seok, and Y. Kim, “DCPA: Approximate Adder Design Exploiting
Dual Carry Prediction,” IEICE Electronics Express, Vol. 18, No. 23, pp. 1-4, Dec.
2021.
H. Seo, Y. S. Yang, and Y. Kim, “Design and Analysis of an Approximate Adder with
Hybrid Error Reduction,” Electronics, Vol. 9, No. 3, pp. 471:1-471:13, Mar. 2020.
J. Lee, H. Seo, Y. Kim, and Y. Kim, “Approximate Adder Design with Simplified Lower-part
Approximation,” IEICE Electronics Express, Vol. 17, No. 15, pp. 1-3, Aug. 2020.
H. Seo and Y. Kim, “A Low Latency Approximate Adder Design based on Dual Sub-Adders
with Error Recovery,” IEEE Transactions on Emerging Topics in Computing, Vol. 11,
No. 3, pp. 811-816, Jul.-Sep. 2023.
V. K. Chippa, S. T. Chakradhar, and A. Raghunathan, “Analysis and Characterization
of Inherent Application Resilience for Approximate Computing,” ACM/IEEE Design Automation
Conference (DAC), pp. 1-9, May 2013.
H. Kim, E. Ham, S. Park, H. Kim, and J.-H Kim, “A DRAM Bandwidth-Scalable Sparse Matrix-Vector
Multiplication Accelerator with 89% Bandwidth Utilization Efficiency for Large Sparse
Matrix,” IEEE Asian Solid-State Circuits Conference (A-SSCC), Nov. 2023.
C.-H. Lin, and I.-C. Lin, “High Accuracy Approximate Multiplier with Error Correction,”
IEEE International Conference on Computer Design (ICCD), pp. 33-38, Oct. 2013.
C.-H. Chang, J. Gu, and M. Zhang, “Ultra Low-Voltage Low-Power CMOS 4-2 and 5-2 Compressors
for Fast Arithmetic Circuits,” IEEE Transactions on Circuits and Systems I: Regular
Papers, Vol. 51, No. 10, Oct. 2004.
A. Saha, R. Pal, A. G. Naik, and D. Pal, “Novel CMOS Multi-bit Counter for Speed-Power
Optimization in Multiplier Design,” AEU-International Journal of Electronics and Communications,
Vol. 95, pp. 189-198, Oct. 2018.
A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, and G. D. Meo, “Comparison and
Extension of Approximate 4-2 Compressors for Low-Power Approximate Multipliers,” IEEE
Transactions on Circuits and Systems I: Regular Papers, Vol. 67, No. 9. pp. 3021-3034,
Sep. 2020.
Z. Yang, J. Han, and F. Lombardi, “Approximate Compressors for Error-Resilient Multiplier
Design,” IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology
Systems (DFTS), pp. 183-186, Oct. 2015.
M. Ha and S. Lee, “Multipliers with Approximate 4-2 Compressors and Error Recovery
Modules,” IEEE Embedded Systems Letters, Vol. 10, No. 1, pp. 6-9, Mar., 2018.
M. Ahmadinejad, M. H. Moaiyeri, and F. Sabetzadeh, “Energy and Area Efficient Imprecise
Compressors for Approximate Multiplication at Nanoscale,” AEU-International Journal
of Electronics and Communications, Vol. 110, No. 152859, Oct. 2019.
T. Kong and S. Li, “Design and Analysis of Approximate 4-2 Compressors for High-Accuracy
Multipliers,” IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, Vol.
29, No. 10, pp. 1771-1781, Oct., 2021.
M. Zhang, S. Nishizawa, and S. Kimura, “Area Efficient Approximate 4-2 Compressor
and Probability-Based Error Adjustment for Approximate Multiplier,” IEEE Transactions
on Circuits and Systems II: Express Briefs, Vol. 70, No. 5, May 2023.
F. Sabetzadeh, M. H. Moaiyeri, and M. Ahmadinejad, “A Majority-Based Imprecise Multiplier
for Ultra-Efficient Approximate Image Multiplication,” IEEE Transactions on Circuits
and Systems I: Regular Papers, Vol. 66, No. 11, pp. 4200-4208, Nov. 2019.
A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and Analysis of Approximate
Compressors for Multiplication,” IEEE Transactions on Computers, Vol. 64, No. 4, pp.
984-994, Apr. 2015.
O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, “Dual-Quality 4:2 Compressors
for Utilizing in Dynamic Accuracy Configurable Multipliers,” IEEE Transactions on
Very Large-Scale Integration (VLSI) Systems, Vol. 25, No. 4, pp. 1352-1361, Apr.,
2017.
S. Venkatachalam and S. Ko, “Design of Power and Area Efficient Approximate Multipliers,”
IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, Vol. 25, No. 5,
pp. 1782-1786, May 2017.
H. Pei, X. Yi, H. Zhou, and Y. He, “Design of Ultra-Low Power Consumption Approximate
4-2 Compressors Based on the Compensation Characteristic,” IEEE Transactions on Circuits
and Systems Ⅱ: Express Briefs, Vol. 68, No. 1, pp. 461-465, Jan. 2021.
H. Seok, H. Seo, J. Lee, and Y. Kim, “Design Optimization of a 4-2 Compressor for
Low-Cost Approximate Multipliers,” IEIE Transactions on Smart Processing and Computing,
Vol. 11, No. 6, pp. 455-461, Dec. 2022.
J. Lee, H. Seo, H. Seok, and Y. Kim, "A Novel Approximate Adder Design Using Error
Reduced Carry Prediction and Constant Truncation," IEEE Access, Vol. 9, pp.119939-119953,
Aug. 2021.
V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, “IMPACT: IMPrecise
adders for low-power approximate computing,” IEEE/ACM International Symposium on Low
Power Electronics and Design (ISLPED), pp. 409-414, Aug. 2011.
Sungyoun Hwang received her B.S. degree from the School of Computer Science and Engineering
at Kyungpook National University, Daegu, Republic of Korea in 2024, where she is currently
pursuing an M.S. degree. Her research interests include approximate multiplier, computer
arithmetic, and quantum computing.
Hyelin Seok received her B.S. and M.S. degrees from the School of Computer Science
and Engineering, Kyungpook National University, Daegu, Republic of Korea in 2022 and
2024, respectively. Her research interests include computer archi-ecture, approximate
arithmetic, and new computing systems.
Yongtae Kim received B.S. and M.S. degrees in electrical engineering from the Korea
University, Seoul, Republic of Korea, in 2007 and 2009, respectively, and a Ph.D.
degree from the Department of Electrical and Computer Engineering from the Texas A&M
University, College Station, TX, in 2013. From 2013 to 2018, he was a software engineer
with Intel Corporation, Santa Clara, CA. Since 2018, he has been with the School of
Computer Science and Engineering at Kyungpook National University, Daegu, South Korea,
where he is currently an Associate Professor. His research interests are in energy-efficient
integrated circuits and systems, particularly, neuromorphic computing, approximate
computing, quantum computing, and new memory devices and architecture.