I. INTRODUCTION
Recently, various sensor systems for Internet of things (IoT) devices have been proposed
and developed (1-4). Image processing is a key technology supporting such sensor systems and has been
actively studied for applications in self-driving cars (5), factory automation systems (6), security technology (7), and so on. The demand for processing high-resolution (4K or 8K) images to detect
objects more precisely is increasing, because camera modules with high performance
are readily available (8). These systems require small size, low power consumption, and high speed. It is more
difficult, however, for a central processing unit (CPU) to process large data in real
time, because it operates sequentially (9).
In contrast, a field-programmable gate array (FPGA) has many advantages including
high-speed processing derived from its parallel operation. Thus, it is suitable for
application in image processing involving large-scale calculation at high speed (9-12).
Fig. 1 shows a typical FPGA-based image-processing system. It consists of an FPGA, memory,
and input/output devices (a camera and monitor). In most cases, a dynamic random-access
memory (DRAM) is adopted as the memory, because it has a large capacity and can be
mounted on a general FPGA board (13,14). The image data is transmitted from the camera to the monitor by the FPGA in the
following steps.
1. The captured camera images are written to the memory and held there temporarily.
2. The image data from the memory is loaded to the FPGA for image processing.
3. The results of image processing are sent to the memory and held there temporarily.
4. The processed results in the memory are output to the monitor.
As described above, an FPGA in an image processing system plays the roles of controlling
all the devices and receiving and transmitting image data at appropriate timings.
Because the frame rates of the camera and
Fig. 1. FPGA-based image processing system
Fig. 2. FPGA-based image processing system
monitor are defined in advance, the memory access timing for image processing is limited.
When the interval between memory accesses for image processing is constant, reducing
the frequency of such memory accesses can avoid collision with other access requests,
but it reduces the processing speed.
We have studied a new method to dynamically control memory access for image processing
by monitoring the status of access requests from the monitor and camera controllers.
A brief operation principle was presented and preliminary experimental results were
shown in the previous report (15). In this paper, we describe our memory access method in detail and present the implementation
in an FPGA board with a DDR3 SDRAM. Also, we show that the image processing speed
of the proposed method is 1.65 times faster than that of the conventional method.
The organization of this paper is as follows. First, we explain the procedure for
memory access in an FPGA-based image processing system in Section II. Next, we describe
the problems of a conventional image processing system in Section III. Then, we propose
a method to optimize the memory access for image processing in Section IV. After that,
we show measurement results with the proposed method in Section V. Finally, we discuss
advantage of the proposed method in Section VI, before concluding the paper in Section
VII.
II. OVERVIEW OF MEMORY ACCESS IN FPGA-BASED IMAGE PROCESSING SYSTEM
This section provides an overview of memory access in an FPGA-based image processing
system. Fig. 2 shows the configuration of the modules for the image processing in the FPGA. These
modules include device controllers for the camera, monitor, and memory, an image processing
module, and a memory arbiter. There are four kinds of memory access: monitor read
access, camera write access, and image processing read and write access. The memory
can accept only one access from one of the modules at one time. When multiple access
requests to the memory occur, only one request is accepted and processed, while the
others may be lost (corresponding to collision as mentioned in Section I). This can
cause system malfunctions. Therefore, the arbiter is usually inserted between the
modules and the memory, and it enables control of the access timing from each controller
to prevent conflicts due to multiple requests occurring at the same time (16). Thus, a memory arbiter sequentially transmits access requests to the memory controller.
In general, the ways of arbitration in a memory arbiter are classified into the following
three schemes: round robin, first in first out (FIFO), and priority (17,18). In the round robin method, a specific amount of time to access to a memory controller
is given to each module in order defined in advance. When a module has a request for
memory in its turn, the request is accepted. Even if it has no request but other modules
have requests, no request is accepted. Although collision of requests does not occur,
it is difficult to improve the processing speed because of the idle time. In the FIFO
method, requests from modules are accepted and executed in order of arrival at an
arbiter. In contrast to the round robin method, all the acceptable time of the memory
are available for all modules, while collision of the requests occurs and some requests
may be lost. In particular, loss of the write request from a camera is not recovered.
Hence, the FIFO method is not suitable for real-time image processing. In the priority
method, a specific priority (low, medium, high) is given to each module. In image
processing, higher priorities should be assigned to a camera and
Fig. 3. Configuration and function of memory arbiter
monitor because they operate with the fixed frame rates. A priority method is also
classified into two types including fixed and dynamic. In the fixed type, the priorities
are given to modules in advance. Therefore, the image processing with lower priority
may be not accepted. On the other hand, the dynamic type switches the priority to
the module accessing to memory frequently, so it can reduce the idle time. However,
it is more difficult to implement the dynamic-type priority arbiter because complicated
condition settings are required.
First, we consider the fixed-priority type arbiter. Fig. 3 shows the configuration of the memory arbiter, which consists of an arbiter controller
and registers. Specifically, it has two registers for each module to store the access
requests from each one. The arbiter operates in the following steps, as shown in Fig. 3. Each register 2 receives the access requests from each module and transfers them
to each register 1. Then, the arbiter controller decides which request to select and
transfers the request to the memory controller in the following priority order. Note
that higher priority is given to memory access from a camera or monitor controller
that has requests at regular intervals as mentioned above.
1. Memory controller is busy: no access.
2. Monitor register 1 has a request: monitor access.
3. Camera register 1 has a request: camera access.
4. Process read register 1 has a request: process read access.
5. Process write register 1 has a request: process write access.
6. No request from any register 1: no access.
Fig. 4. Process access permission timing
Here, the memory-busy state (priority order: 1) occurs when the memory controller
is refreshing the DRAM or when the buffer in the memory controller is filled (13).
III. PROBLEMS OF MEMORY ACCESS FOR IMAGE PROCESSING WITH CONVENTIONAL METHOD
Because the memory arbiter operates according to the priority order described in Section
II, memory access for image processing is permitted only if the process access permission
is high, as shown in Fig. 4. Here, we focus on either reading or writing in image processing, and for simplicity,
we assume that only one register is used.
The timings of the image processing and memory access requests are synchronized with
that of the clock signal. The frequencies of the processing and the requests are equal
to the clock frequency divided by an integer, which is a constant. Thus, the access
requests are sent to the arbiter at regular intervals. That causes some problems in
image processing, as described below.
Fig. 5 shows the flow of memory access for image processing. First, as described in Section
II, a process access request sent from the image processing module is stored in a
process register in the arbiter. When the process access permission is high, the arbiter
controller can receive the request from the register, and the memory access is executed.
To increase the image processing speed, the access request interval, tinterval, should
become as short as possible. In that case, a process access request may be overwritten
and lost, as shown in Fig. 5, which prevents the system from operating properly. This is because a
Fig. 5. Process access timing in shorter interval case
Fig. 6. Process access timing in longer interval case
Fig. 7. Active and inactive periods for monitor and camera access
new request (request 6) that comes before an existing request (request 5) in the register
is transferred to the memory. As a result, the existing request is overwritten by
the new request and lost.
To avoid this problem, the interval should be long enough to prevent losing requests,
as shown in Fig. 6. Unfortunately, this decreases the speed of image processing. In addition, the request
is not sent to the arbiter even though a request can be accepted, and the timing of
the process access permission is wasted.
Fig. 8. Process access permission timing in inactive period
Fig. 9. Process access permission timing in inactive period
Furthermore, there are two distinct periods, in which access requests from the camera
and monitor controllers occur frequently (active period) or rarely (inactive period),
as shown in Fig. 7. This is due to the specifications of the communication protocol for the camera and
monitor. During an inactive period, more process access permission remains in comparison
with the case of an active period, as shown in Fig. 8. This means that much time is wasted in terms of improving the image processing speed,
as shown in Fig. 9.
IV. MEMORY ACCESS OPTIMIZATION METHOD
The problems described in Section III are derived from the constant intervals between
process access requests. It is desirable to control the request intervals dynamically.
For example, the more requests for image processing can be accepted when there is
not too much memory access, while the process access should be limited during an active
period by monitoring the memory status. Therefore, dynamic memory access is effective
for both
Fig. 10. Proposed stop-processing function
preventing the loss of processing requests and reducing the wasted time during inactive
periods.
Hence, we propose a method for dynamically controlling the intervals of memory access
requests according to the memory state. This is a type of the fixed-type priority
methods, and different from the dynamic-type priority method described in Section
II because priority given to each module is fixed in our method. When the image-processing
module has priority (priority order in Section II is 4 or 5), requests for image processing
are accepted as many as possible under monitoring the memory status. Consequently,
the arbiter can accept the requests dynamically.
In general, because of its algorithm, the image processing module often makes access
requests for reading and writing simultaneously. When both the process read and process
write registers 2 already have requests, they may be overwritten with the following
requests. Thus, we designed a stop-processing function so that the image processing
module temporarily stops giving the arbiter access requests when there is a request
in the process read or process write register 2, as shown in Fig. 10. Here, we assume that image processing has priority (priority order: 4 or 5) as described
in Section II. The output signal of an OR block indicates “existence” i.e., whether
the request exists in each register. In this way, the output signal works as a stop-processing
signal for the image processing module. After the arbiter controller completes transferring
all the requests of the read and write registers 1, and the requests of the read and/or
write registers 2 are transferred to the registers 1, the registers 2 have no requests.
At this time, both existence signals are set to low, and the stop-processing signal
(OR output) is turned off. Then, the operation of the image processing module is restarted.
Fig. 11. Optimization of process access timing in inactive period
Therefore, the image processing module’s operation can be switched according to the
stop-processing signal in relation with the memory state.
Fig. 11 shows the timing chart for memory access using the proposed method. While the registers
1 and 2 have no requests, or when the process access permission is enabled, the image
processing module continues access requests. When the process access permission is
not enabled, access requests are stopped by the stop-processing function. In this
way, the proposed method ensures that, in principle, no requests are lost and no time
is wasted.
V. MEASUREMENT RESULTS
As listed in Table 1, we implemented an FPGA-based image processing system with the proposed method and
examined its operation characteristics. The image size was 640 × 480 pixels, and the
maximum frame rate was 30 fps. An easily implemented low-pass filter (LPF) (20) was adopted for image processing to confirm the operation of the proposed system.
The process of the LPF was performed by calculating the average pixel value in a patch
size of 10 × 10 pixels over the entire input image of 640 × 480 pixels. The output
frame rate was set to 60 fps because of the monitor’s specifications. Fig. 12 shows the results of image processing, which confirmed that the LPF operated properly
and the image was averaged as expected.
Table 1. Measurement conditions
Device
|
Manufacturer
|
Model
|
Evaluation board
|
Digilent
|
Nexys Video
|
FPGA
|
Xilinx
|
Artix-7
XC7A200T-1SBG484C
|
Memory
|
Micron
Technology
|
DDR3 SDRAM
MT41K256M16HA-187E [19]
|
Camera
|
OmniVision
|
OV5642 camera module
(CMOS image sensor)
|
Fig. 12. LPF processing result
To examine the validity of the proposed method, both the conventional and proposed
processing systems were implemented on an FPGA. Then, the output signals, such as
monitor access, camera access, memory-busy, and LPF access signals, were evaluated
by using a logic analyzer. Here, the LPF access included both read and write accesses
for LPF image processing. As shown in Fig. 13, the memory access interval for image processing was a fixed value of 140 ns, which
corresponded to only 5 accesses within 640 ns in the conventional method. During the
active period indicated by (1) in Fig. 14(a), the number of memory accesses with the proposed method was 9 within 640 ns, which
was 1.8 times more than that with the conventional method, as shown in Fig. 14(b). Furthermore, during the inactive period indicated by (2) in Fig. 14(a), the number of memory accesses was 14 within 640 ns, which was 2.8 times greater,
as shown in Fig. 14(c).
Next, as shown in Fig. 15, the image processing speed of the conventional method was 18.6 fps, which was determined
by the minimum fixed interval to avoid the loss of image processing access requests.
In contrast, the
Fig. 13. Measurement results of memory access status by logic analyzer in conventional
method
Fig. 14. Measurement results of memory access status by logic analyzer in proposed
method
proposed method increased the processing speed to 30.7 fps, which was 1.65 times faster.
The interval between image processing accesses changed dynamically between 10 and
800 ns, depending on the memory access status.
Fig. 16 shows the measurement results of time for processing one image. The total time of
the proposed method was 32.8 ms, which was 60% as large as that of the conventional
method. As shown in Fig. 16, the memory-free time, meaning wasted time with no access from anywhere, was significantly
reduced. Although we used two registers for each module in this implementation, the
memory-free state can be almost
Fig. 15. Measurement results for image processing speed
Fig. 16. Measurement results of time for processing one image
completely eliminated by increasing the number of process registers as FPGA resources
allow. An increase in the memory-busy time means that memory is used efficiently,
because the buffer in the memory controller is easily saturated by increasing the
number of accesses.
From the above results, the proposed method improved memory access for image processing
and achieved 1.65 times faster processing as compared to the conventional method.
Overall, the effectiveness of this method was verified.
VI. DISCUSSIONS
We consider advantage of the proposed method and difficulty in implementing an image-processing
system with the dynamic memory access control. Because our method does not require
changing the priority (fixed-priority type), it becomes easier to design the memory
arbiter compared with the dynamic-priority type. In addition, the proposed dynamic
arbiter determines whether it accepts new request for image processing or not according
to the stop signal depending on the memory state when an image-processing module has
priority. Therefore, it does not affect the operations of the camera and monitor,
and the arbitration is achieved dynamically in a simple configuration.
However, the intervals between the image-processed outputs are not fixed, which makes
it difficult to adjust the timing of the monitor output or additional processing.
In our system, the difference of the intervals is absorbed by the memory because the
image-processed data is written to the memory temporally and then output to the monitor.
In general, the operation speed of the memory is not so high. For further improvement
of the processing speed, it is required that the processed data is directly transferred
to the monitor or other image-processing modules with appropriate timing adjustment.
VII. CONCLUSIONS
We developed a novel memory access method to improve the processing speed in an FPGA-based
image processing system. This method dynamically controls the intervals between memory
access requests for image processing by monitoring the memory status. We implemented
an image processing system with the proposed method and examined its characteristics.
In an implementation using the conventional method, the access interval was fixed,
which limited the processing speed. On the other hand, the processing speed of the
proposed method was 2.8 times faster (in an inactive period) and 1.65 times faster
(in an active period) than that of the conventional method, without losing any memory
access requests.
ACKNOWLEDGMENTS
Part of this research was supported by Grants-in-Aid for Scientific Research, from
the Japan Society for the Promotion of Science.
REFERENCES
Singh D., Tripathi G., Jara A. J., March 2014, A survey of Internet-of-things: Future
vision, architecture, challenges and services, in 2014 IEEE World Forum on Internet
of Things (WF-IoT), pp. 287-292
Chen S., Xu H., Liu D., Hu B., Wang H., August 2014, A vision of IoT: Applications,
challenges, and opportunities with China perspective, IEEE Internet of Things Journal,
Vol. 1, No. 4, pp. 349-359
Miraz M. H., Ali M., Excell P. S., Picking R., September 2015, A review on Internet
of things (IoT), Internet of everything (IoE) and Internet of nano things (IoNT),
in 2015 Internet Technologies and Applications (ITA), pp. 219-224
Yin Y., Zeng Y., Chen X., Fan Y., March 2016, The Internet of things in healthcare:
An overview, Journal of Industrial Information Integration, Vol. 1, No. , pp. 3-13
Altera , December 2013, FPGA-based control for electric vehicle and hybrid electric
vehicle power electronics, available from:
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-01210-electric-vehicles.pdf
[last accessed August 2020]
Rahmatov N., Paul A., Saeed F., Hong W., Seo H., Kim J., October 2019, Machine learning-based
automated image processing for quality management in industrial Internet of things,
International Journal of Distributed Sensor Networks, Vol. 15, No. 10
Du Y., Ives R., Nevel A., She J., January 2011, Editorial advanced image processing
for defense and security applications, EURASIP Journal on Advances in Signal Processing,
Vol. 2010
Matsuo Y., Sakaida S., November 2017, Super-resolution for 2K/8K television using
wavelet-based image registration, in 2017 IEEE Global Conference on Signal and Information
Processing (GlobalSIP), pp. 378-382
Asano S., Maruyama T., Yamaguchi Y., August 2009, Performance comparison of FPGA,
GPU and CPU in image processing, in 2009 International Conference on Field Programmable
Logic and Applications, pp. 126-131
Torres-Huitzil C., Arias-Estrada M., 2004, Real-time image processing with a compact
FPGA-based systolic architecture, Real-Time Imaging, Vol. 10, No. 3, pp. 177-187
Altera , 2013, Real-time challenges and opportunities in SoCs, available from:
https://wwwintelcom/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-01190-real-time-socspdf
[last accessed August 2020]
Hernandez-Lopez A., Torres-Huitzil C., Garcia-Hernandez J. J., July 2015, FPGA-based
flexible hardware architecture for image interest point detection, International Journal
of Advanced Robotic Systems, Vol. 12, pp. 1-15
Xilinx , 2018, Zynq-7000 SoC and 7 series devices memory interface solutions, available
from:
https://www.xilinx.com/support/documentation/ip_documentation/mig_7series/v4_2/ug586_7Series_MIS.pdf
[last accessed August 2020]
Xilinx , 2011, 7 series FPGAs memory interface solutions, available from:
https://www.xilinx.com/support/documentation/ip_documentation/ug586_7Series_MIS.pdf
[last accessed August 2020]
Nishiguchi K., Inoue T., Tsuchiya A., Ogohara K., Kishine K., October 2019, Optimization
technique of memory traffic for FPGA-based image processing system, in 2019 International
SoC Design Conference (ISOCC), pp. 46-47
Tigadi A., Guhilot H., November 2018, Design of an arbiter for two systems accessing
a single DDR3 memory on a reconfigurable platform, International Journal of Information
Engineering and Electronic Business, Vol. 10, pp. 14-20
Helal K. A., Attia S., Ismail T., Mostafa H., June 2015, Priority-select arbiter:
An efficient round-robin arbiter, in 2015 IEEE 13th International New Circuits and
Systems Conference (NEWCAS), pp. 1-4
Yang Y., Wu R., Zhang L., Zhou D., January 2015, An asynchronous adaptive priority
round-robin arbiter based on four-phase dual-rail protocol, Chinese Journal of Electronics,
Vol. 24, No. 1, pp. 1-7
Micron Technology , 2018, 4Gb: x4, x8, x16 DDR3L SDRAM, available from:
https://wwwmicroncom/-/media/client/global/ documents/products/data-sheet/dram/ddr3/4gb_
ddr3lpdf [last accessed August 2020]
Gonzalez R., Woods R., 2018, Digital Image Processing, 4th ed. Pearson
Author
He received the B.E. degree of electronic systems engi-neering from the University
of Shiga Prefecture in 2017.
Since the same year, he has enrolled a master's course Graduate school of Engi-neering
in the University of Shiga Prefecture.
His research interest an FPGA based circuits and systems.
Toshiyuki Inoue received the B.S., M.S. and Ph.D. degrees in Electrical Electronic
and Information Engi-neering from Osaka University, Osaka, Japan in 2010, 2012 and
2015, respectively.
He joined the Depart-ment of Electronic Systems Engi-neering, the University of Shiga
Prefecture, in 2017, and has been an Assistant Professor since 2017.
His research interests include RF circuits for wireless communication, wireless sensor
networks, radio-over-fiber technique and optoelectronics.
Dr. Inoue is a member of the Institute of Electronics, Information and Communication
Engineers (IEICE) of Japan and the Japan Society of Applied Physics (JSAP).
He received the Paper Award in 2013 from IEICE.
Rei Yamazaki is currently pursuing a B.E degree in Electronic Systems Engineering
at the University of Shiga Prefecture, Japan.
His research interest includes FPGA based systems and image processing.
Kazunori Ogohara received the B.S., M.S. and Ph.D. degrees in Graduate school of Science
from Kyoto University, Kyoto, Japan in 2005, 2007 and 2010, respectively.
He joined the Department of Electronic Systems Engineering, the University of Shiga
Prefecture, in 2013 as an assistant professor, and has been a lecturer since 2019.
His research interests include Martian atmospheric science and semantic segmentation
of Martian dust storms using machine learning.
Dr. Ogohara is a member of the Meteorological Society of Japan (JMS), Information
Processing Society of Japan (IPSJ), and Division for Planetary Sciences of the American
Astronomical Society (DPS-AAS).
He received the Outstanding Paper Award for Young Scientist from COSPAR in 2012.
Akira Tsuchiya received the B.E., M.E. and Ph.D. degrees in Communications and Computer
Engineering from Kyoto University, Kyoto, Japan, in 2001, 2003, and 2005, respectively.
Since 2005, he has been an Assistant Professor in the Department of Communications
and Computer Engineering, Graduate School of Informatics, Kyoto university.
Since 2017, he has been an Associate Professor in the Department of Electronic Systems
Engineering, the University of Shiga Prefecture, Shiga, Japan.
His research interest includes modeling and design of on-chip passive components
of high-frequency CMOS, and high-speed analog circuit design.
He is a member of the IEEE, IEICE and IPSJ.