Baik Junhyuk1
Kim Yongtae1*
-
(School of Computer Science and Engineering, Kyungpook National University, Daegu,
Korea )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Hash function, Secure hash algorithm (SHA), SHA-256, Approximate computing, Approximate adder, Cryptographic, Ciphertext, Throughput
1. Introduction
A hash function is a category of cryptographic primitives that plays an important
role in ensuring security methods, such as data integrity and authentication. An original
input message is condensed by a hash function and becomes a fixed-length ciphertext
known as a message digest. The hash function used to generate a ciphertext must be
computationally simple but impossible to revert back to its original. Because of its
efficiency and reliability, the SHA-256 hash function, which was released by the National
Institute of Standards and Technology (NIST), is widely used in the field of security
[1]. Since the hardware design of the SHA-256 provides better throughput than the software
design, it is more pertinent for applications that demand high throughput. Furthermore,
the hardware design of the SHA-256 has a non-disruptive feature, unlike the software
design [2].
Increasing energy efficiency has become important because the hardware is faced with
limited battery and energy resources. In order to achieve high energy efficiency and
hardware performance of the hardware design, approximate computing, which trades off
the computation precision for acceptable processing quality, can be leveraged.
In this paper, we propose a novel high-throughput SHA-256 design, which reduces the
area and delay effectively using approximate arithmetic. Specifically, we focus on
replacing the existing adder of the conventional SHA-256 with the approximate adders
to improve the hardware performance and save hardware resource consumption while satisfying
hash functionality. The existing 32-bit adder is replaced by the split k-bit adder.
The proposed design was analyzed mathematically, and its hardware performance (e.g.,
energy and throughput) was also evaluated. In addition, to check whether the generated
message digest is cryptographically secure and distributed randomly, we conduct the
randomness tests developed by NIST and Avalanche effect [3]. The proposed design improves hardware performance and the Avalanche effect, compared
with the conventional design, and all the message digests generated by the proposed
SHA-256 are randomly distributed. The results verified that the proposed design could
be competitive.
In summary, the contribution of this work is as follows:
· We present a new SHA-256 design that exploits approximate adders to improve hardware
efficiency.
· The proposed SHA-256 is thoroughly analyzed to confirm that it improves overall
throughput and energy efficiency that stems from shortened carry chain compared with
the existing adder.
· To verify that the generated message digest is cryptographically secure, Avalanche
effect and randomness tests were performed and passed.
2. Background and Motivation
2.1 SHA-256 Basics
The SHA-2 was developed with a number of noteworthy upgrades from the SHA-1. The SHA-2
is made up of four different algorithms, which correspond to output lengths of 224-,
256-, 384-, and 512-bit, respectively. Although it consumes more memory and takes
longer to process, a longer output length is beneficial for security. Because of its
great efficiency, processing speed, and Avalanche effect, SHA-256 becomes a key component
in many applications, such as blockchain and hash-based message authentication code
(HMAC). The overall structure of the SHA-256 process is shown in Fig. 1. The SHA-256 hash algorithm can be divided into three steps: Pre-processing, Message
scheduler, and Compression function.
Fig. 1. Block diagram of SHA-256.
Pre-processing, a step in padding an input message, is the first stage of the SHA-256.
We define L as the length of the original message. A single bit "1" is added to the
end of the message, followed by K "0" bits, which must satisfy the equation L + 1
+ K ${\equiv}$ 448 mod 512. The length of the message, calculated in 64-bit big-endian,
is added to the rest of the message. Thus, the original message is padded and becomes
a message, which is a multiple of 512-bit. The second step of SHA-256 is the Message
scheduler. In the message scheduler, each 32-bit word: W$_{t}$ is produced during
64 rounds. We define t as representing the number of rounds from 0 to 63. W$_{t}$
is as follows:
where
In (2), the ROTR$_{n}$(x) denotes a right rotation of x by n-bit, and the SHFR$_{n}$(x)
denotes a right shift of x by n-bit. All variables in the above equations are 32-bit
unsigned integers and addition is calculated modulo 2$^{32}$. The compression function
is the final step of SHA-256, where the variables A, B, C, D, E, F, G, and H are updated.
The constant value K$_{t}$ is obtained from the first 32-bit of the fractional parts
of the cube roots of the first 64 primes. In the first round, the variables are assigned
by initial hash values, calculated by the first 32-bit of the fractional parts of
the square roots of the first 8 primes. Every 64 rounds, the variables are updated
using the previously calculated W$_{t}$ and K$_{t}$. The A, B, C, D, E, F, G, and
H can be computed by:
where
The initial hash values: $H_{0},\,H_{1},\,H_{2},\,H_{3},\,H_{4},\,H_{5},\,H_{6},$
and $H_{7}$ can be calculated by using (5).
In the end, the output is computed by concatenating the hash values.
2.2 Motivational Analysis
The conventional SHA-256 designed in Verilog HDL and synthesized using 28-nm CMOS
technology consumes the area, delay, and power of 11964.26 $\mu$m$^{2}$, 1.40 ns,
and 6.24 mW, respectively. As can been seen in Fig. 1, clearly, the critical path of the SHA-256 design lies on the computation of the
next A value. This path includes several 32-bit adders, which degrade overall delay
and energy efficiency. Hence, we re-design the 32-bit adders to reduce the critical
path delay. Since the addition modulo 2$^{32}$ significantly reduces the speed of
the hash function, compared to the other calculation steps [4], in order to achieve low hardware consumption, we focused on replacing the addition
modulo 2$^{32}$with an approximate adder, which offers chances to increase energy
efficiency by lowering computation precision. The 32-bit adder was split into 1-,
2-, 4-, 8-, and 16-bit, respectively, resulting in reducing delay and area, and achieving
high throughput in all the adders.
3. Related Works
Various research works have been studied to improve the throughput and performance
of SHA-256 [2, 5-12]. Also, previous research has been conducted on whether it is
cryptographically secure while verifying several tests, such as the Avalanche test,
and randomness tests [13,14].
Rote et al. [2] proposed a new architecture using a fully iterative and Round Pipelined Technique
(RPT) to improve throughput per area of SHA-2. In [5], a rescheduling method during round computations was proposed and implemented on
the Xilinx Virtex-4 FPGA. It enhanced the quality of throughput by reducing the critical
path. Xiaoyong et al. [6] suggested a high-performance computation hardware architecture in the ASIC of SHA-256.
To shorten the critical path, parallel pipelines were used and the computation chain
was converted to independent parts. Power and area decreased compared to the conventional
SHA-256. In [7], through optimization techniques, a design was proposed that increases throughput
due to reduced power and area. An unfolding design of SHA-256 based on FPGA was presented
in [8]. The designs produced an unfolding architecture, which makes maximum frequency and
area better than conventional SHA-256. Kahri et al. [9] proposed efficient hardware implementation of SHA-256 on the FPGA. For the hash function,
a finite state machine (FSM) is employed. In [10], in order to optimize SHA-256, a pre-computation method was devised. Compared to
the conventional pipelined SHA-256, the throughput increased, and the area penalty
decreased. Chaves et al. [11] developed a new technique using operation rescheduling and hardware reutilization.
The critical path was significantly reduced while the required area also decreased
thanks to the technique. Glabb et al. [12] presented a multi-mode architecture that can independently execute the SHA-224 and
SHA-256 blocks as well as the SHA-384 or SHA-512 hash algorithms. This research wanted
to considerably reduce the computing overhead while avoiding any time delays brought
on by handling input messages.
To address the Length Extension attack problem, an improved padding method and modified
hashing process were presented in [13]. Statistical tests employing the Strict Avalanche effect, frequency test (Monobit),
frequency test within a block, and run test were carried out to demonstrate that the
updated hash function is cryptographically secure. Since the distribution of zeros
and ones in the final hash value under tests was uniformly random, the message digest
produced by modified SHA-256 was cryptographically secure. Roshdy et al. [14] proposed a new secure hash algorithm based on SHA-256 and Message Digest 5 (MD5).
The proposed algorithm met the requirements of the Avalanche test and differential
attack test.
4. Proposed SHA-256 Architecture
In SHA-256 hardware design, the primary steps, message scheduler and compression function,
were replaced with a novel design that could save hardware considerably by using approximate
computing to reduce hardware resource consumption. Specifically, to reduce the delay
by shortening the critical path, the 32-bit adder in the message scheduler and compression
function was swapped out with the approximate adder. The 32-bit adder accounts for
a large part of the operation of SHA-256, compared to the other calculation steps.
The addition is used seven times from round 0 to 15, 10 times from 16 to 63, and eight
times when the round is finished, and the last hash values are added. Thus, applying
approximate computing to the existing adder can significantly increase the energy
efficiency of the SHA-256 hardware design. The approximate adder used in the proposed
SHA-256 design is illustrated in Fig. 3. Note that it is an improvement in the structure of the adders painted gray in Fig. 1 and the adders in the message scheduler. The 32-bit adder is split into several k-bit
adders, where k is 1-, 2-, 4-, 8-, and 16-bit adder, respectively. Each adder divided
into k-bit is calculated separately, and all carry-in is ignored in the adders to
cut the long carry propagation chain and improves the latency. It is noteworthy that
the k-bit adder does not depend on the addition algorithms but can be implemented
in any kind of accurate adders, such as ripple carry adder (RCA), carry save adder
(CSA), and carry-lookahead adder (CLA). The 32-bit is calculated by dividing them
into k-bits, and the final outcome of the split adder is to concatenate the k-bits
that result from each. Since a carry-out is ignored when it occurs, the carry chain
is shortened when compared with the existing 32-bit adder. This results in a decrease
in delay and area, which allows high throughput and performance of the proposed design
to be achieved.
Fig. 3. (a) the existing 32-bit adder; (b) general hardware architecture of the proposed approximate adder of SHA-256.
5. Experimental Results
5.1 Hardware Performance
To evaluate the hardware resource consumption, the proposed SHA-256 design was synthesized
in Verilog HDL using 28-nm CMOS technology. Using the same design methodology, we
also synthesized and implemented the conventional design so that we could compare
it to the conventional SHA-256 design. In addition, the proposed designs functionally
verified that the generated message digest is secure cryptographically and randomly
distributed.
The conventional SHA-256 and the proposed SHA-256 with various design configurations
are summarized in Table 1. We compared the conventional SHA-256 with designs where the 32-bit adder is replaced
by a split approximate adder with 1-, 2-, 4-, 8-, and 16-bit configurations, respectively.
Comparing the proposed SHA-256 design to the conventional design, the area is reduced
by up to 15% in the 1-bit adder-based SHA-256 design. Furthermore, there is a noticeable
improvement in the delay as expected. As a result of the synthesis, we confirmed that
the adders of calculating A$_{\mathrm{t+1}}$ in Fig. 1 are the critical pass. In all the proposed SHA-256 design configurations, the delay
reduces by 15.7%, 27.1%, 35.0%, 45.7%, and 52.9%, respectively. The considerable overall
delay reduction stems from the approximate adder that cuts the long carry chain to
enhance its delay performance. The power increases slightly as the delay reduces.
The power increases from a minimum of 17.1% to a maximum of 46.8%. The energy is improved
in all the configurations, especially 9.3% and 11.3% in 2-bit and 1-bit configurations,
respectively. The result shows that the area, delay, and energy are further improved
as the 32-bit adder is split into smaller units.
Table 1. Performance summary of various adders of proposed SHA-256 design.
Design
|
Area (${\mu}$m$^{2}$)
|
Delay (ns)
|
Power
(mW)
|
Energy
(pJ)
|
Conv.
|
11946.2
|
1.40
|
6.24
|
8.74
|
16-bit
|
11914.0
|
1.18
|
7.31
|
8.63
|
8-bit
|
11595.1
|
1.02
|
8.12
|
8.29
|
4-bit
|
11171.7
|
0.91
|
8.93
|
8.12
|
2-bit
|
10605.5
|
0.76
|
10.44
|
7.93
|
1-bit
|
10158.1
|
0.66
|
11.74
|
7.75
|
To compare the SHA-256 designs in terms of both area and delay, we evaluate the area-delay
product (ADP). Fig. 4 exhibits the ADP of the conventional SHA-256 and the proposed SHA-256 designs. In
all the configurations of the proposed SHA-256, the ADP is significantly improved
by 19.1%, 41.6%, 64.8%, 107.8%, and 149.8% in the 16-, 8-, 4-, 2-, and 1-bit configurations,
respectively.
Additionally, the energy-delay product (EDP) was computed to evaluate the design from
the perspective of energy efficiency. Fig. 5 shows the EDP comparison between the proposed designs and the conventional SHA-256.
According to the result, all the proposed designs enhance the EDP by 20.2%, 44.8%,
65.6%, 103.0%, and 139.3%, respectively.
In addition to the EDP, the throughput was calculated to compare the designs in terms
of the performance of the hardware design. Fig. 6 shows the comparison of the throughput. The throughput can be calculated by using
the following equation
In all the configurations of the proposed SHA-256, the throughput is improved considerably.
There was an improvement of 18.6%, 37.3%, 53.8%, 84.2%, and 112.1% in the 16-, 8-,
4-, 2-, 1-bit configuration, respectively.
Fig. 4. Comparison of area-delay product (ADP) of various SHA-256 designs.
Fig. 5. Comparison of energy-delay product (EDP) of various SHA-256 designs.
Fig. 6. Comparison of throughput of various SHA-256 designs.
5.2 Cryptographic Performance
The message digest generated by the proposed modified SHA-256 is different from the
conventional SHA-256. Therefore, to verify that the generated message digest is cryptographically
secure, several tests, such as the Avalanche effect, frequency test (Monobit), frequency
test within a block, and runs test were implemented. These tests confirm how much
of the 512 bits are flipped and how uniformly and randomly the generated message of
the proposed design is distributed.
First, we obtained the Avalanche effect to evaluate how a change in one bit affects
the ciphertext. The Avalanche effect is defined by
Table 2 summarizes the result of comparing conventional SHA-256 and the proposed SHA-256
designs. We have used the same input messages used in [13] to evaluate the Avalanche effect. The conventional SHA-256 achieves (44%, 47%, 46%),
(45%, 55%, 49%) in the 16-bit, (44%, 52%, 48%) in the 8-bit, (48%, 54%, 52%) in the
4-bit, (49%, 54%, 52%) in the 2-bit, and (46%, 52%, 49%) in the 1-bit configuration,
in the order of minimum, maximum, and average. In all of the proposed designs, the
average Avalanche effect is higher than the traditional one.
Table 3 shows the results of the randomness tests of all the SHA-256 designs. Like the Avalanche
effect, this randomness test was also conducted by applying the same input message
used in [13] to the proposed SHA-256 designs. In a 16-bit configuration, the average p-value is
(0.429242, 0.548896, 0.772074) each for the frequency test (Monobit), frequency test
with a block, and runs test. The average p-value is (0.286187, 0.380368, 0.468927),
(0.449564, 0.580835, 0.449564), (0.452549, 0.495265, 0.386507), and (0.503335, 0.537192,
0.602609) in the 8-, 4-, 2-, 1-bit configuration, respectively. Since the p-value
is ${\geq}$ 0.01, the conclusion is that all the results are random.
In short, the experimental results confirm that the proposed modified SHA-256 design
has a comparable cryptographical performance with the conventional design and is cryptographically
secure.
Table 2. Performance summary of Avalanche effect of various SHA-256 designs.
Input Message
|
Conventional SHA-256
|
16-bit SHA-256
|
8-bit SHA-256
|
4-bit SHA-256
|
2-bit SHA-256
|
1-bit SHA-256
|
Message Digest
|
Avala
nche Effect
|
Hash
|
Avala
nche Effect
|
Hash
|
Avala
nche Effect
|
Hash
|
Avala
nche Effect
|
Hash
|
Avala
nche Effect
|
Hash
|
Avala
nche Effect
|
The quick brown fox jumped over the head of the lazy dog.
|
181756ad8209f
cf8201ecca48e
1ab001861395a
40bb997b3616
8bea6c2c2c1a8
|
46.48%
|
d16f9752bfa6e
1353b14dcbd2a
d390011d0a07
bc5f3501a192ff
613bb5bab147
|
48.05%
|
350aace2d5fe3
050d33225b47
35b9de17b7f7f
1d4d16d9a1bb7
defae08562fac
|
52.34%
|
7254bb6972f2d
f6f802f73798d
261b76f9da164
7a25b8f6d9d0f
3d96d8ff35ce
|
54.30%
|
aaed49f93b4a2
d8a88582be2a6
2ac159bca4474
67a02988d193
5aba126830fa7
|
49.61%
|
6bef51e19c6e0
c9e9536c948d6
affefb2317c178
36ca8ea5c42ac
99f71cd5f01
|
46.88%
|
The QUICK brown fox jumped over the head of the lazy dog.
|
e0c5f8dd34312
c110c52c1594e
2ec545b968113
471709bb6812
ddd8f73daafe7
|
ec88815437e67
a8507f87d8b60
ef3a17782ab87
499407c7e10a1
4b36cd5511d6
|
c2a9953ed0718
33ddbfca82114
514212e931836
9b7e083820bda
c28c48af7ecd
|
4863c85cf6830
ddab3f2ca0fbdf
33f4cb5070bf1
8f36a53d62eb6
a731bb49621
|
8320dc22a9f24
60a19c788c78f
e355dae302398
137bca8956c07
c4504e279867
|
8cdb7a0c1fe08
d39d03b5e92b
23b16f57e77ec
3fa412a54bca2
93a1af4e65960
|
Table 3. Randomness performance summary of various SHA-256 designs.}
Input Message
|
Randomness Test
|
16-bit SHA-256
|
8-bit SHA-256
|
4-bit SHA-256
|
2-bit SHA-256
|
1-bit SHA-256
|
P-value
|
Result
|
P-value
|
Result
|
P-value
|
Result
|
P-value
|
Result
|
P-value
|
Result
|
The QUICK brown fox jumped over the head of the lazy dog.
|
Frequency Test (Monobit)
|
0.617075
|
Random
|
0.260589
|
Random
|
0.381574
|
Random
|
0.133614
|
Random
|
0.617075
|
Random
|
Frequency Test within a block
|
0.730552
|
Random
|
0.374911
|
Random
|
0.605303
|
Random
|
0.342296
|
Random
|
0.409438
|
Random
|
Runs Test
|
0.814516
|
Random
|
0.963209
|
Random
|
0.765135
|
Random
|
0.213014
|
Random
|
0.790328
|
Random
|
6. Conclusion
This paper presented a novel high-throughput SHA-256 design that reduces the hardware
cost effectively using approximate computing. The existing 32-bit adders of the SHA-256
are replaced by the proposed split k-bit adder to achieve high throughput. To compare
our proposed design with the conventional SHA-256 design, both designs were designed
in Verilog HDL and synthesized using 28-nm CMOS technology. In addition, we functionally
verified the generated message digest to examine whether it is cryptographically secure
and randomly distributed. The results demonstrated that the proposed designs, when
compared to the traditional one, reduce area, delay, and energy by 8.9%, 45.7%, and
9.3 in the 2-bit configuration and by 15%, 52.9%, and 11.3% in the 1-bit configuration,
respectively. Additionally, the proposed designs improve the ADP, EDP, and throughput
performances by at least 19.1%, 20.2%, and 18.6% with a maximum of 149.8%, 139.3%,
and 112.1%, respectively. Also, all the proposed designs obtain a higher average Avalanche
effect than the traditional ones, and all the message digests generated by the proposed
SHA-256 are randomly distributed. Based on the results, the proposed design reduced
area and power, improving hardware throughput and energy efficiency. Therefore, our
proposed designs can be a new alternative to the SHA-256 with high hardware performance
and energy efficiency.
ACKNOWLEDGMENTS
This work was supported in part by the Basic Science Research Program through
the National Research Foundation of Korea (NRF) funded by the Ministry of Education
(NRF-2019R1I1A3A01061266) and in part by the BK21 FOUR project (AI-driven Convergence
Software Education Research Program) funded by the Ministry of Education, School of
Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).
REFERENCES
National Institute of Standards and Technology (NIST) , August 2015, Announcing the
secure hash standard. Federal Information Processing Standards Publication 180-4 (FIPS
PUB 180-4)
Rote M. D., Vijendran N., Selvakumar D., 2015, High performance SHA-2 core using the
Round Pipelined Technique, 2015 IEEE International Conference on Electronics, Computing
and Communication Technologies (CONECCT), pp. 1-6
Bassham L., Rukhin A., Soto J., Nechvatal J., Smid M., Leigh S., Levenson M., Vangel
M., Heckert N., Banks D., 2010, A Statistical Test Suite for the Validation of Random
Number Generators and Pseudo Random Number Generators for Cryptographic Applications,
NIST Pubs Special Publication (NIST SP) - 800-22 Rev 1a
Dadda L., Macchetti M., Owen J., 2004, The design of a high speed ASIC unit for the
hash function SHA-256 (384, 512), Proceedings Design, Automation and Test in Europe
Conference and Exhibition, Vol. 3, pp. 70-75
Chen Y., Li S., 2020, A High-Throughput Hardware Implementation of SHA-256 Algorithm,
2020 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-4
Zhang X., WU R., Wang M., Wang L., 2019, A High-Performance Parallel Computation Hardware
Architecture in ASIC of SHA-256 Hash, 2019 21st International Conference on Advanced
Communi-cation Technology (ICACT), pp. 52-55
Zhang X., WU R., Wang M., Wang L., 2020, A High- A. H. Gad, S. E. E. Abdalazeem, O.
A. Abdelmegid and H. Mostafa, Low power and area SHA-256 hardware accelerator on Virtex-7
FPGA, 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES),
pp. 181-185
binti Suhaili S., Watanabe T., 2017, Design of high-throughput SHA-256 hash function
based on FPGA, 2017 6th International Conference on Electrical Engineering and Informatics
(ICEEI), pp. 1-6
Kahri F., Mestiri H., Bouallegue B., Machhout M., 2015, Efficient FPGA hardware implementation
of secure hash function SHA-256/Blake-256, 2015 IEEE 12th International Multi-Conference
on Systems, Signals & Devices (SSD15), pp. 1-5
Michail H., Milidonis A., Kakarountas A., Goutis C., 2005, Novel high throughput implementation
of SHA-256 hash function through pre-computation technique, 2005 12th IEEE International
Conference on Electronics, Circuits and Systems, pp. 1-4
Chaves R., Kuzmanov G., Sousa L., Vassiliadis S., 2006, Improving SHA-2 Hardware Implemen-tations,
Lecture Notes in Computer Science, pp. 298-310
Glabb R., Imbert L., Jullien G., 2007, Multi-mode operator for SHA-2 hash functions,
Journal of Systems Architecture, Vol. 53, No. 2-3, pp. 127-138
Cortez D. M. A., Sison A. M., Medina R. P., 2020, Cryptographic randomness test of
the modified hashing function of sha256 to address length extension attack, Proceedings
of the 2020 8th International Conference on Communications and Broadband Networking,
pp. 24-28
Roshdy R., Fouad M., Aboul-Dahab M., 2013, Design And Implementation A New Security
Hash Algorithm Based On Md5 And Sha-256, International Journal of Engineering Sciences
& Emerging Technologies, pp. 29-36
Author
Junhyuk Baik received the B.S. degree from the School of Computer Science and Engineering
at Kyung-pook National University, Daegu, the Republic of Korea in 2022, where he
is currently pursuing an M.S. degree. His research interests include appro-ximate
computing and hardware design.
Yongtae Kim received the B.S. and M.S. degrees in electrical engineering from the
Korea University, Seoul, Republic of Korea, in 2007 and 2009, respectively, and the
Ph.D. degree from the Department of Electrical and Computer Engineering from the Texas
A&M University, College Station, TX, in 2013. From 2013 to 2018, he was a software
engineer with Intel Corporation, Santa Clara, CA. Since 2018, he has been with the
School of Computer Science and Engineering at Kyungpook National University, Daegu,
the Republic of Korea, where he is currently an assistant professor. His research
interests are energy-efficient integrated circuits and systems, particularly neuromorphic
computing and approximate computing, and new memory devices and architectures.