(Yongchul Jung)
1
(Jaechan Cho)
1
(Seongjoo Lee)
2
(Yunho Jung)
1
-
(School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si,
South Korea)
-
(Department of Information and Communication Engineering, Sejong University, Seoul,
South Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
FFT, hand gesture recognition, MDC, radar
I. INTRODUCTION
Research on human-computer interaction (HCI) has become increasingly important as
user-friendly interfaces can help to directly employ natural communication and operation
skills of humans for controlling the myriad of currently available electronic devices.
Hand gesture recognition (HGR) is one of the simplest ways for interacting with a
computer, as the movement and orientation of the hands and fingers can be encoded
to easily command devices (1-5). Most gesture recognition systems are based on computer vision and usually rely on
a camera and an image recognition algorithm. However, vision based HGR presents several
limitations under conditions such as variable lighting and dynamic background, and
can be ineffective for real-time interaction. Moreover, a high-resolution and speed
camera is required to acquire fast and fine gestures of the fingers, and thus the
hardware cost can be high. Unlike vision technology, radio detection and ranging (radar)
is insensitive to variable lighting, and an HGR system using this technology can feature
reduced cost, computational complexity, and power consumption (6).
The fast Fourier transform (FFT) processor is one of the most complex units in radar
systems. Therefore, reducing its complexity can notably contribute to a more efficient
hardware design (7). The number of FFT points N for radar system is given by
where ${\lambda}$ is the wavelength of the radar signal and ${\Delta}$v is velocity
resolution of target (8). T is maximum pulse repetition interval (PRI), which is defined as
where v$_{\mathrm{max}}$ is the maximum velocity of target, which for finger movement
can reach up to 18 km/h. Likewise, a suitable velocity resolution for HGR system is
0.05 km/h. Moreover, at least a three-channel receiver is required to estimate the
azimuth and altitude angles from the phase difference among pairs of signals (9), and therefore the FFT processor for HGR radar should support at least 600\textasciitilde{}720-point
operation for three-channel input data streams (10). To further optimize the FFT calculation, radix-2$^{2}$ or radix-2$^{3}$ algorithms
allow to reduce the number of nontrivial multiplications and can be adopted in HGR
radar. Still, the radix-2 architecture requires a 1024-point FFT processor, thus increasing
the hardware requirements (11).
Multi-channel FFT processors are commonly designed by using one FFT processor per
data stream. In addition, a pipeline FFT architecture using single-path delay feedback
(SDF) provides the lowest number of nontrivial multiplications, which are the most
complex operations during FFT calculation. The increasing hardware complexity of implementing
SDF in multi-channel systems can be reduced using a multipath delay commutator (MDC)
architecture to simultaneously handle multiple input data streams through a single
FFT processor (12). This alternative has been shown to be more area efficient than the SDF architecture
for multiple channels.
In this paper, we propose a three-channel 729-point FFT processor for HGR radar based
on radix-3$^{2}$ algorithm and MDC architecture to minimize the number of non-trivial
multiplications. The rest of the paper is organized as follows. Section 2 presents
the FFT for HGR, and the hardware architecture of the proposed FFT processor is described
in Section 3. Section 4 presents the design and implementation of the proposed FFT
processor. In Section 5, we draw some conclusions.
II. FFT FOR HGR
The N-point discrete Fourier transform (DFT) is defined as
Based on the divide and conquer algorithm, indices n and k can be written as
where 0 ${\leq}$ n$_{1}$ ${\leq}$ 2, 0 ${\leq}$ k$_{1}$ ${\leq}$ 2, 0 ${\leq}$ n$_{2}$
${\leq}$ 2, 0 ${\leq}$ k$_{2}$ ${\leq}$ 2, 0 ${\leq}$ n$_{3}$ ${\leq}$ (N/9 - 1),
and 0 ${\leq}$ k$_{3}$ ${\leq}$ (N/9 - 1). Replacing (4) and (5) in (3) we obtain
In (6), BF$_{3}$((N/9)n$_{2 }$+ n$_{3}$, k$_{1}$) corresponds to the radix-3 decimation-in-frequency
(DIF) butterfly operator expressed as
Similarly, H (n$_{3}$, k$_{1}$, k$_{2}$) can be written as
Therefore, (6) comprises a two-step radix-3 butterfly operation and a N/9-point DFT (20). Term$W_{N}^{nk}$is a twiddle factor, whose multiplication can be divided into a
trivial and a non-trivial multiplication according to indices n and k. Therefore,
the$W_{9}^{n_{2}k_{1}}$multiplication in (6) can be expanded as
where$~ n_{2}k_{1}\in \left\{0,1,2,4\right\}. $As$W_{9}^{n_{2}k_{1}}$is a simply constant
scaling, (9) can be designed by a trivial multiplication using shifters and adders, and the radix-3$^{2}$
algorithm has the same number of non-trivial multiplications as radix-9 algorithm
but maintaining the butterfly structure of radix-3 algorithm.
Table 1 lists the hardware requirements of three-channel R2$^{2}$SDF, R2$^{3}$SDF, and R3$^{2}$MDC
FFT processors, where R denotes the radix algorithm. To compare these FFT architectures,
the area of complex multiplier is given in terms of equivalent adders. If a 16-bit
complex multiplier is implemented with three real multipliers and five real adders,
it is equivalent to fifty adders (13). As shown in Table 1, the proposed implementation, R3$^{2}$MDC diminishes the number of adders by 35.8%
compared to the R2$^{2}$SDF architecture, and by 24.3% compared to the R2$^{3}$SDF
architecture. Fig. 1 shows the required number of adders according to the number of FFT points. The proposed
algorithm can be considered as the most area-efficient for a number of FFT points
between 2$^{3q}$ + 1 and 3$^{\mathrm{2q}}$, with q {\textgreater} 1.
Table 1. Hardware requirements for 3-channel R2$^{2}$SDF, R2$^{3}$SDF, and R3$^{2}$MDC
architectures
|
Complex Multiplier
|
Complex
Adder
|
Total
Adder
|
Reduction
(%)
|
Trivial
|
Non
trivial
|
R2$^{2}$SDF
|
15
|
12
|
60
|
720
|
-
|
R2$^{3}$SDF
|
18
|
9
|
80
|
610
|
15.3
|
R3$^{2}$MDC
|
9
|
6
|
81
|
462
|
35.8
|
Fig. 1. Number of adders for R2$^{2}$SDF, R2$^{3}$SDF, and R3$^{2}$MDC architectures
according to the number of FFT points.
III. FFT PROCESSOR HARDWARE ARCHITECTURE
Fig. 2 shows the hardware architecture of the proposed three-channel 729-point FFT processor,
which consists of data mapping modules DMM1 to DMM5, radix-3 butterfly modules R3BM1
to R3BM6, and a data reordering module (DRM). DMM1 consists of a multiplexer and a
dual-port random access memory (DPRAM) for reconstructing the input data stream and
transmitting it to the next stage. R3BM1, R3BM3, and R3BM5 are composed of one radix-3
butterfly (R3BF) operator and three trivial multipliers, whereas R3BM2 and R3BM4 are
composed of one R3BF operator and three nontrivial multipliers, and R3BM6 is composed
of only one R3BF operator. DMM2 to DMM6 reconstruct the input data using a delay component
and a commutator and transfer the reconstructed data to the next stage. Finally, the
DRM reconstructs the outputs of module R3BM6 with a structure similar to that of DMM1,
consisting of multiplexer and DPRAM.
Fig. 2. The hardware architecture of the proposed FFT processor.
1. DMM1 and DRM
Both DMM1 and DRM reconstruct data using a delay component for the next stage. In
general, the delay component is implemented as a shift register, and the area that
this operation occupies notably increases with the FFT length and number of data paths.
In (14), a method for reducing the area using DPRAM instead of shift registers is proposed.
Likewise, in the proposed FFT processor, we design DMM1 and DRM using three DPRAM
modules and six multiplexers as shown in Fig. 3, where the data scheduling of DMM1 is illustrated in Fig. 4. First, input data streams a0 to a728, b0 to b728, and c0 to c728 enter into DMM1
(Fig. 4(a)) and are reconstructed by the left-side multiplexers (Fig. 4(b)). These reconstructed data streams are registered in the corresponding DPRAM and
then retrieved (Fig. 4(c)). Finally, the DPRAM outputs are reordered by the right-side multiplexers (Fig. 4(d)) and transferred to R3BM1.
Fig. 3. Block diagram of DMM1.
Fig. 4. Data reordering of DMM1.
Fig. 5. Hardware architecture for multiplication of 0.765625 - j0.640625.
2. R3BM1, R3BM3 and R3BM5
The twiddle factors of R3BM1, R3BM3, and R3BM5 are$W_{9}^{~ n_{2}k_{1}}$ as in (6). The possible twiddle factors are $W_{9}^{~ 1}$,$W_{9}^{~ 2}$and$W_{9}^{~ 4}$because$n_{2}k_{1}\in
\left\{0,1,2,4\right\},$and expressed as
where 0.7660 and 0.6428 being approximated as
The calculation in (13) and (14) can be implemented with shifters and adders as shown in Fig. 5. Therefore, multiplying twiddle factor$W_{9}^{~ 1}$is a trivial operation because
it can be implemented using shifters and adders. Likewise, multiplying twiddle factors$W_{9}^{~
2}$and$W_{9}^{~ 4}$can be implemented by using (15) to (18) as shown in Fig. 6 and 7, respectively.
Fig. 6. Hardware architecture for multiplication of 0.171875 - j0.984375.
Fig. 7. Hardware architecture for multiplication of -0.93750-j0.34375.
3. DMM2 to DMM6
DMM2 to DMM6 are composed of shift registers and commutators without DPRAM, because
their delays are small. As shown in Fig. 2, DMM2 has shift registers with sizes of 81 and 162, whose data reordering pattern
is depicted in Fig. 8. The three input data streams of DMM2 are simultaneously transferred from R3BM1,
with the second and third data streams delayed by shift registers of sizes 81 and
162, respectively. Commutators switch each delayed data and introduce additional delays
into the first and second data streams, as shown in Fig. 8. The aligned data streams are transferred to the next stage, R3BM2. DMM3 to DMM6
are similarly implemented with shift registers of different sizes and the same alignment
pattern of DMM2.
Fig. 8. Data reordering of DMM2.
IV. DESIGN AND IMPLEMENTATION
The proposed FFT processor was designed using hardware description language (HDL)
and synthesized to gate-level circuits using a standard cell library of 65 nm CMOS
process. A 12-bit word for the real and imaginary data paths was selected to satisfy
the requirement for a signal-to-quantization-noise ratio (SQNR) of 40 dB, as detailed
in Table 2. The proposed architecture results in a die size of 0.81 mm$^{2}$ with the 219K logic
gates and memory of 18 KB, as specified in Table 3 and 4. Given that R3BM2 and R3BM4 include three nontrivial multipliers, which are the most
complex operations in the FFT processor, their area proportion is high. Fig. 9 shows the layout of the proposed FFT processor, which was compared to similar recent
developments (15-19) as summarized in Table 5. For a fair comparison, we normalized the area as
where M, N, and Tech are the number of the parallel data paths, the FFT size, and
the process technology in nanometers, respectively. Table 5 shows that the normalized area of the proposed FFT processor is the smallest among
the different processors, because it requires the minimum number of nontrivial multipliers.
Table 2. SQNR for word lengths between 10 and 15 bits
Bit
|
SQNR (dB)
|
Bit
|
SQNR (dB)
|
10
|
32
|
13
|
51
|
11
|
40
|
14
|
58
|
12
|
46
|
15
|
63
|
Table 3. Logic gate count of the proposed FFT processor
Module
|
Gate Count (K)
|
Proportion (%)
|
R3BM1
|
15
|
6.85
|
DMM2
|
34
|
15.53
|
R3BM2
|
43
|
19.63
|
DMM3
|
30
|
13.7
|
R3BM3
|
15
|
6.85
|
DMM4
|
12
|
5.48
|
R3BM4
|
43
|
19.63
|
DMM5
|
6
|
2.74
|
R3BM5
|
15
|
6.85
|
DMM6
|
5
|
2.28
|
R3BM6
|
1
|
0.46
|
Total
|
219
|
100
|
Table 4. Key features of the implemented FFT processor
Parameter
|
Value
|
Technology
|
65 nm 1P9M CMOS
|
Operating Frequency
|
96 MHz
|
Internal Memory
|
18K bytes
|
Voltage
|
I/O
|
2.5 V
|
Core
|
1.2 V
|
Core Size
|
0.81 mm$^{2}$
|
Table 5. Comparison of the proposed FFT processor with recent developments
|
[15]
|
[16]
|
[17]
|
[18]
|
[19]
|
This work
|
FFT Length
|
1024
|
128~2048
|
1024, 8192
|
128~2048
|
1024 ~ 8192
|
729
|
FFT Architecture
|
MDC
|
SDF
|
Single
|
MDC
|
SDF
|
MDC
|
FFT Algorithm
|
Radix-4
|
Radix-2
|
Radix-2$^{3}$
|
Radix-4/8
|
Balanced binary-tree
|
Radix-3$^{2}$
|
Number of Channel
|
4
|
1
|
1
|
4
|
1
|
3
|
Frequency (MHz)
|
30
|
40
|
56
|
40
|
112
|
96
|
Word Length (Bit)
|
16
|
32
|
22
|
12
|
24
|
24
|
SQNR (dB)
|
N.A.
|
N.A.
|
40
|
N.A.
|
55
|
46
|
Max. Throughput @ R Hz
|
4R
|
R
|
R
|
4R
|
R
|
3R
|
Technology (nm)
|
65
|
180
|
180
|
90
|
180
|
65
|
Area (mm$^{2}$)
|
8.299
|
6.76
|
4.84
|
3.1
|
3.52
|
0.81
|
Normalized Area
|
207.48
|
80.14
|
48.55
|
36.75
|
35.31
|
28.39
|
Normalized Area / SQNR
|
N.A.
|
N.A.
|
1.214
|
N.A.
|
0.642
|
0.617
|
Fig. 9. Layout of the proposed FFT processor
V. CONCLUSIONS
We propose an area-efficient FFT processor for HGR radar. The radix-3$^{2}$ algorithm
and MDC pipelined hardware architecture allow to minimize the number of nontrivial
multiplications and support the three-channel 729-point FFT operation required for
HGR. The FFT processor is implemented using a 65 nm CMOS process, whose normalized
area is the smallest among similar FFT processors. Moreover, the proposed algorithm
and hardware architecture can achieve high performance in a small area, and we expect
it to be suitable for compact HGR radar.
ACKNOWLEDGMENTS
This work was supported by the Technology Innovation Program, 10080619, funded by
the Ministry of Trade, Industry and Energy (MOTIE, Korea) and CAD tools were supported
by IDEC.
REFERENCES
Pickering C. A., Mar. 2005, The search for a safer driver interface: A review of gesture
recognition human machine interface, Computing & Control Engineering Journal., Vol.
16, No. 1, pp. 34
Pavlovic V. I., Sharma R., Huang T. S., Jul. 1997, Visual interpretation of hand gestures
for human-computer interaction: A review, Pattern Analysis and Machine Intelligence,
IEEE Transactions on, Vol. 19, pp. 677-695
Suarez J., Murphy R. R., Sep 2012, Hand gesture recognition with depth images: A review,
Robot and Human Interactive, IEEE Intenational Symposium on, pp. 411-417
Parada-Loira F., Gonzalez-Agulla E., Alba-Castro J., June 2014, Hand gestures to control
infotainment equipment in cars, Intelligent Vehicles, IEEE Symposium on, pp. 1-6
Jung Y., Lim E., Jung Y., Feb. 2017, Low-Complexity 3-Channnel FFT Processor for Hand
Gesture Recognition Radar Systems, Korea Conference on Semiconductors, pp. 77
Lien J., Gillian N., Karagozler M.. E., Amlihood P., Schwesig C., Olson E., Raja H.,
Poupyrev I., July 2016, Soli: Ubiquitos Gesutre sensing with millimeter wave radar,
ACM Transaction on Graphics, Vol. 35, No. 4
Nathanson F. E., Reilly J. P., Cohen M. N., 1999, Radar Design Principles, 2nd ed.,
SciTech Publishing
Kronauage M., Rohling H., Oct. 2014, New Chirp Sequence Radar Waveform, Aerospace
and Electronic Systems, IEEE Transaction On, Vol. 50, No. 4, pp. 2870-2877
Molchanov P., Gupta S., Kim K., Pulli K., May 2015, Multi-sensor system for driver's
hand-gesture recognition, Automatic Face and Gesture Recognition, IEEE Conference
on, pp. 1-8
Molchanov P., Gupta S., Kim K., Pulli K., May 2015, Short-range FMCW monopulse radar
for hand-gesture sensing, Radar Conference IEEE, pp. 1491-1496
Nguyen T., Lee H., Feb. 2017, High-throughput Low-complexity Mixed-radix FFT Processor
using a Dual-path Shared Complex Constant Multiplier, Semiconductor Technology and
science, IEIE Journal of, Vol. 17, No. 1
Jung Y., Yoon H., Kim J., Feb 2003, New efficient FFT algorithm and pipeline implementation
results for OFDM/DMT applications, Consumer Electronics, IEEE Transaction on, Vol.
49, No. 1, pp. 14-20
Sansaloni T., Perez-Pascual A., Torees V., Valls J., Sep. 2005, Efficient pipeline
FFT processors for wireless LAN MIMO-OFDM systems, Elecotro-nics Letters, Vol. 41,
No. 19, pp. 1043-1044
Yoshizawa S., Nishi K., May 2008, An Area and Power Efficient Pipeline FFT Processor
for 8x8 MIMO-OFDM Systems, Circuits and Systems, IEEE International Symposium on,
pp. 1248-1251
Seok M., et al. , Feb., 2011, A 0.27V 30MHz 17.7nJ/transform 1024-pt complex FFT core
with super-pipelining, Solid-State Circuits Conference, 2011. ISSCC 2011. Digest of
Technical Papers, IEEE Internaltional, pp. 342-344
Chhatbar T. D., Darji A. D., Jan., 2009, High Speed High Throughput FFT /IFFT Processor
ASIC for Mobile Wimax, Emerging Trends in Engineering and Technology, 2nd IEEE International
Conference on, pp. 402-405
Lin Y. W., Liu H. Y., Lee C. Y., Nov., 2004, Dynamic scaling FFT processor for DVB-T
applications, Solid State Circuits, IEEE Journal of, Vol. 39, No. 11, pp. 2005-2013
Yang K. J., Tsai S. H., Chuang G. C. H., Apr., 2013, MDC FFT/IFFT Processor With Variable
Length for MIMO-OFDM Systems, Very Large Scale Integration Systems, IEEE Transactions
on, Vol. 21, No. 4, pp. 720-731
Lee H., Park I., Apr., 2007, Balanced Binary-Tree Decomposition for Area-Efficient
Pipelined FFT Processing, Circuits and Systems I: Regular Papers, IEEE Transactions
on, Vol. 54, No. 4, pp. 889-900
Shih X. -Y., Liu Y. -Q., Chou H. -R., Jun., 2017, 48-Mode Reconfigurable Design of
SDF FFT Hardware Architecture Using Radix-3^2 and Radix-2^3 Design Approaches, IEEE
TCAS-I, Vol. 64, No. 6, pp. -
Author
Yongchul Jung received the B.S. and M.S. degrees in the School of Electronics and
Information Engi-neering from Korea Aerospace University, Goyang, Korea, in 2015 and
2017, respectively. He is currently working towards the Ph.D. degree in the School
of Electronics and Information Engineering, Korea Aerospace University. His research
interests include the signal processing algorithm and VLSI implementation for the
radar signal processing system.
Jaechan Cho received the B.S. and M.S. degrees in the School of Electronics and Information
Engi-neering from Korea Aerospace University, Goyang, Korea, in 2015 and 2017, respectively.
He is currently working towards the Ph.D. degree in the School of Electronics and
Information Engineering, Korea Aerospace University. His research interests include
the signal processing algorithm and VLSI implementation for the image processing system.
Seongjoo Lee received his B.S., M.S., and Ph.D. degrees in Depart-ment of Electrical
and Electronic Engineering from Yonsei University, Seoul, Korea, in 1993, 1998, and
2002, respectively. From 2002 to 2003, he was a senior research engineer at the IT
SOC Research Center, Yonsei University, Seoul, Korea. From 2003 to 2005, he was a
senior engineer in Samsung Electronics Co. Ltd., Suwon, Korea. He was a research professor
at the IT Center and the IT SoC Research Center, Yonsei University, Seoul, Korea from
2005 and to 2006. He is currently a professor in the Department of Information and
Communication Engineering at Sejong University, Seoul, Korea. His current research
interests include PN code acquisition algorithms, cdma2000 modem SoC design, CDMA
communication, and SoC design for image processing.
Yunho Jung received the B.S., M.S., and Ph.D. degrees in Department of Electrical
and Electronic Engineering from Yonsei University, Seoul, Korea, in 1998, 2000, and
2005, respectively. From 2005 to 2007, he was a senior engineer in Samsung Electronics,
Suwon, Korea. From 2007 to 2008, he was a research professor at Institute of TMS Information
Technology, Yonsei University, Seoul, Korea. He is currently a professor in the School
of Electronics and Information Engineering, Korea Aerospace University, Goyang, Korea.
His research interests include the signal processing algorithm and VLSI implementation
for the wireless communication and image processing systems.