The Journal of
the Korean Society on Water Environment

The Journal of
the Korean Society on Water Environment

Bimonthly
  • ISSN : 2289-0971 (Print)
  • ISSN : 2289-098X (Online)
  • KCI Accredited Journal

Editorial Office


  1. 국립한밭대학교 건설환경공학과 (Dept. of Civil & Environmental Eng., Hanbat National University)
  2. (주)유앤유 (UnU Inc.)
  3. 국립낙동강생물자원관 원생생물연구팀 (Protist Research Team, Nakdonggang National Institute of Biological Resources)



Deep learning, Microalgae, Object detection, Water supply system, YOLOv3

1. Introduction

Algal bloom is one of the important issues in the management of drinking water supply systems. The overgrowth of algae has various harmful effects on water quality such as unfavorable odor or taste (Codd et al., 2005; Paerl and Otten, 2013; WHO, 2004). The cell walls of diatoms are not removed by the regular disinfection process and often cause technical problems such as clogging of filtration beds in water treatment plants. Cyanobacteria release algal toxins in freshwater systems, which cause direct damage to human health.

Thus, continuous monitoring of algae in freshwater such as rivers and reservoirs is essential. One of the most common and traditional monitoring methods is the visual identification of algae using a microscope. However, this is laborious and time-consuming. Thus, the development of a rapid and less labor-intensive method for algae image identification is required.

Object detection is a fundamental subject and is continuously studied in computer vision research (Zhao et al., 2019). Object detection technology based on deep learning algorithms has made noticeable accomplishments in recent decades. The convolutional neural network (CNN) is the most representative and widely used deep learning algorithm in object detection studies (LeCun et al., 2015; Zhao et al., 2019). The characteristics of the target image are extracted by computation processes called convolution and pooling and used for the classification of the target object. AlexNet, a deep learning algorithm based on the convolutional neural network, was a winner in the 2012 image net large scale visual recognition challenge (ILSVRC) and is considered one of the algorithms that shows the practical applicability of deep learning in object detection (Krizhevsky et al., 2012; Russakovsky et al., 2015). Since AlexNet, various algorithms have been developed, and these models can be categorized into two types (i.e., one-stage model and two-stage model)(Sultana et al., 2020; Zhao et al., 2019).

Regions with convolutional neural network (R-CNN) is one of the first two-stage object detection models developed. In the first stage of R-CNN, the algorithm proposes multiple possible regions where a target object can be located. In the second stage, the model finds the location of the target object and classifies it using the CNN algorithm (Girshick et al., 2014; Zhao et al., 2019). Various two-stage object detection algorithms that have improved on R-CNN have been developed, such as spatial pyramid pooling (SPP), Fast R-CNN, Faster R-CNN, and Mask R-CNN(Girshick, 2015; He et al., 2017; He et al., 2015; Ren et al., 2015).

YOLO is considered one of the most representative one-stage object detection models where the region proposal and classification are unified and processed in a single stage (Redmon et al., 2016; Sultana et al., 2020). There are also various one-stage object detection models such as single shot multibox detector (SSD) and Retina-Net (Lin et al., 2017; Liu et al., 2016). The YOLO model was continuously improved from version 1 to version 3, and Redmon and Farhadi (2018) proposed a third version of the YOLO model(YOLOv3). The inference time of YOLOv3 for object detection process ranges from 22 to 51 milliseconds depending on the resolution of input images, which is a much faster inference time than other models such as SSD and RetinaNet (Redmon and Farhadi, 2018). Although YOLOv3 has slightly less accuracy than these two models, the noticeably faster inference time can be considered one important advantage of the YOLO model as a real-time object detection model (Redmon and Farhadi, 2018). Object detection model development is a competitive field and there are still various ongoing issues. Thus, it is the researcher’s choice to select an optimal model for their research field.

Recently, several studies present the practical possibility of using deep learning models such as YOLO for algae image detection (Pedraza et al., 2018; Salido et al., 2020). Pedraza et al. (2018) used the YOLO model for the classification of diatoms, and it classified objects of nine diatom species in images with overall precision and recall of 0.74. More recently, Salido et al. (2020) used YOLO to classify 10 diatom species with mean precision of 0.727. The composition of the input image data set such as the number of target objects and use of colors affect the object model performance while related research in algal image detection is still in an early stage.

In this study, a deep learning object detection algorithm, YOLOv3, was used for the detection and classification of algae images obtained from freshwater. The model was trained and tested for the classification of five, 10, 20, and 30 target species so that the effect of the number of target objects on model performance could be analyzed and the practical applicability of the model verified, where the effect of color of the images on model performance was also compared using the same data groups with grayscale photos.

2. Materials and Methods

2.1 Data sources

2.1.1 Image acquisition

A total of 1,114 photos with 3,663 objects for 30 genera were used to develop the YOLOv3 algae image detection model (Table 1). The photos were collected by microscope (Eclipse Ni, NIKON, Japan) from algae cultivated in pure cultures.

Table 1. Algae images used for YOLOv3 model development
Genus Number of genera included in a group Number of photos Number of labeled images
5 10 20 30
Acutodesmus obliquus O O O O 11 75
Ankistrodesmus falcatus X O O O 20 32
Chlamydomonas asymmetrica X X X O 65 393
Chlorella vulgaris O O O O 82 712
Chlorococcum loculatum X X O O 27 88
Chroomonas coerulea X X O O 90 559
Closterium sp. X O O O 65 129
Coelastrella sp. X X X O 10 19
Coelastrum astroideum var. rugosum X X X O 12 37
Cosmarium sp. X O O O 35 154
Cryptomonas lundii X X O O 37 36
Desmodesmus communis O O O O 183 314
Diplosphaera chodatii X O O O 3 38
Eudorina unicocca X X X O 40 205
Euglena sp. X X O O 58 76
Kirchneriella aperta X X O O 24 81
Lithotrichon pulchrum X X X O 11 55
Micractinium pusillum X X O O 18 74
Micrasterias sp. O O O O 8 8
Monoraphidium sp. X X O O 34 66
Mychonastes sp. X X X O 32 133
Nephrochlamys subsolitaria X X O O 9 31
Pectinodesmus pectinatus X X O O 68 69
Pediastrum duplex X X X O 75 77
Pseudopediastrum boryanum X X X O 28 29
Scenedesmus sp. X X O O 9 13
Selenastrum capricornutum O O O O 12 43
Sorastrum pediastriforme X O O O 9 9
Tetrabaena socialis X X X O 34 81
Tupiella speciosa X X X O 5 27
Sum 5 10 20 30 1,114 3,663

2.1.2 Input image data set and labelling

The algae images were divided into four groups with five, 10, 20, and 30 genera (Table 1) to compare the model’s performance with various numbers of target objects for classification. The color of the images can affect model performance, especially as the number of target genera increases. Thus, each data group was also prepared with grayscale photos to test the model’s sensitivity to the colors of the algae images. Thus, the YOLOv3 model was trained with eight different data sets. The ratio of data used for model training and testing is 7:3.

Each photo contains more than one cell image object, and each algal cell image object was labeled manually for training and testing the YOLOv3 model using a labeling program developed in this study (Table 1). For example, there were 11 photos of Acutodesmus obliquus where each photo contained from one to several cell image objects. Thus, a total of 75 Acutodesmus obliquus objects were labelled. The label includes the coordinates of the bounding box and class of the target object so that the model identifies the location and class of each cell object during the training process (Fig. 1).

Fig. 1. Example of algae cell image labelling.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3DB7.png

2.2. Model development

2.2.1 YOLOv3 model

YOLOv3 predicts the location and class of objects in a single neural network process where a convolutional neural network with 53-layer, Darknet-53 is used as the main model network (Redmon and Farhadi, 2018). YOLOv3 has been continuously improved from YOLOv1 and YOLOv2 (Redmon et al., 2016; Redmon and Farhadi, 2017, 2018).

YOLOv1, the first version of the YOLO model, divides the input image into an S×S grid where S=7 is used to evaluate the model (Redmon et al., 2016). Each grid cell predicts B (assigned as 2) bounding boxes for object detection and the confidence score is calculated for each bounding box.

The confidence score is defined as P×IOU, where P is the probability that the bounding box contains an object and IOU is intersection over union (Redmon et al., 2016). IOU is calculated with the following equation (Eq. 1), as illustrated in Fig. 2.

Fig. 2. Schematic of area of overlap and area of union for IOU calculation.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3DD7.png
(1)
IOU = area  of  overlap  between  bounding  boxes area  of  union  between  bounding  boxes

Each bounding box contains five values. These values are coordinates (x, y) that represent the center of the box relative to the bounds of the grid, the width (w) and height (h) of the box, and the confidence score (Redmon et al., 2016). Thus, each grid prediction consists of S×S×(B×5+C) tensor where C is the number of object classes trained. YOLOv1 is evaluated with S=7, B=2, and C=20 using the PASCAL VOC data set (Redmon et al., 2016). The bounding box with the highest IOU with ground truth is assigned as responsible for predicting an object. YOLOv2 is improved from YOLOv1. YOLOv2 uses a convolutional neural network with 19 layers, called Darknet-19, and anchor boxes to predict bounding boxes.

YOLOv3 has several important improvements from the previous version of YOLO models. First, YOLOv3 uses Darknet-53 as the main model network. YOLOv3 also extracts three different scales of images, called a feature map. The size of each feature map is 13×13, 26×26 and 52×52 (Redmon and Farhadi, 2018; Zhao and Ren, 2019).

Each grid cell has three anchor boxes with different shape scores. A model structure diagram of Darknet-53 and YOLOv3 can be found in previous studies (Pedraza et al., 2018; Tian et al., 2019; Zhao and Ren, 2019), and a simple schematic is shown in this study (Fig. 3). The attributes of each anchor box are the location of object, objectness score and class (Fig. 4)(Redmon and Farhadi 2018). The objectness score is also predicted for each bounding, which represents the probability of whether the target in the bounding box is an object. Each bounding box also predicts the classes of the target object contained in the bounding box using logistic classifiers so that multilabel classification is possible (Redmon and Farhadi, 2018).

Fig. 3. Schematic of YOLOv3 structure.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3DF7.png
Fig. 4. Attributes of a bounding box. n is the number of classes for prediction.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3E17.png

2.2.2 Model training and optimization

The eight input data sets were trained by YOLOv3 coded by C++ language. The model frame was programed by C# language using OpenCV 4.4 and NVDIA GPU Toolkit 11. The model was trained by Darknet YOLO (https://github.com/ AlexeyAB/darknet) where pre-trained Darknet53.conv.74 was used. The hyperparameters were used with default values of the YOLOv3 model with batch size 64, learning rate 0.001, and max_batch the number of class×2000. The best model was determined by comparing model performance for every 100 batches.

2.3. Model evaluation

2.3.1. Precision and mean average precision

For object detection, the model prediction can be divided into four indicators as follows.

⋅True positive(TP):  the number of observed positive values that were correctly predicted,

⋅False positive(FP): the number of observed positive values that were wrongly predicted,

⋅False negative(FN): the number of observed negative values that were wrongly predicted,

⋅True negative(TN): the number of observed negative values that were correctly predicted.

The model performance can be evaluated by precision(PR) and recall(RE) defined by the four indicators(Eq. 2-3).

(2)
P R = T P T P + F P
(3)
R E = T P T P + F N

The Precision-Recall curve (P-R curve) represents the change of PR through the change of recall over an interval from 0 to 1, which is commonly used to consider both PR and recall for object detection model evaluation (Ozenne et al., 2015; Tian et al., 2019).

The average precision (AP) is calculated from the sum of the area under the P-R curve for each class of image, representing the average of precision through the overall interval of recall. The mean average precision (mAP) is the average of AP for all image classes. Redmon and Farhadi (2018) verified that YOLOv3 performs well with an IOU of 50%, and Zhao and Ren (2019) also noted that the YOLOv3 model performs strongly with an IOU of 50%. In this study, the model performance was evaluated by PR and mAP using an IOU of 50%.

3. Results and discussion

3.1 Model performance comparison

The model performance for eight data sets was compared in Fig. 5. The model shows better mAP for data sets with a small number of genera, both for color and grayscale image. The mAP was 81, 70, 52, and 41 for data sets with five, 10, 20, and 30 genera, respectively. PR was higher than 0.8 for all data sets with color images, and the highest PR was observed for the data set with 10 genera. The model using grayscale images had about 11% and 4% higher PR for data sets with five and 10 genera, respectively, while it had about 6% and 8% lower PR for data sets with 20 and 30 genera.

Fig. 5. mAP and PR of YOLOv3 model for test data set.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3E18.png

The model shows better performance for data sets with a small number of genera when using grayscale images. On the other hand, higher PR was observed when using color images for data sets with a larger number of genera. These results suggest that color images provide more useful information for object detection and improve model PR when a relatively larger number of genera is classified.

3.2 Detection characteristics

A detailed analysis of model prediction results, including misclassified cases, provides useful information to understand and improve model performance. The model developed in this study shows good performance for a single cell image. For photo images with a single algae cell, the model predicted the algae image with more than 95% confidence for all four data sets of five, 10, 20, and 30 genera with color images (Fig. 6). However, the model misclassified overlapped images and those with similar morphology. Small and crowded images are also often misclassified. The misclassified cases can be summarized as follows.

Fig. 6. Detection confidence ofDesmodesmus communisfor five, 10, 20 and 30 genera data sets.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3E58.png

Case 1. Misclassification of overlapped image

The model misclassified overlapped images as the number of classes increased. For data sets with five genera, the model detected the image with more than 90% confidence (Fig. 7). However, the confidence decreased to 73% for 20 genera, and the model could not detect images for 30 genera.

Fig. 7. Detection ofSelenastrum capricornutumfor five, 10, 20 and 30 genera models.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3E97.png

Case 2. Misclassification of morphologically similar image

The model tended to misclassified or could not detect algae with similar images. For example, Selenastrum capricornutum and Pectinodesmus pectinatus have similar morphology, and the model could not detect either one (Fig. 8).

Fig. 8. Detection ofSelenastrum capricornutumandPectinodesmus pectinatusfor 20 genera models.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3ED6.png

Case 3. Misclassification of small and crowed image

The model misclassified or could not detect small and crowded images, even for five genera, which is considered a limitation of the current model (Fig. 9).

Fig. 9. Detection ofAcutodesmus obliquusfor five, 10, 20 and 30 genera models.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3F06.png

An example of detailed model performance in Case 1 and Case 3 are summarized in Table 2. The number of correctly detected objects and model accuracy decrease as the number of target genera increases in both cases.

Table 2. Accuracy of five, 10, 20 and 30 genera model for misclassification case
Genera five 10 20 30 Reference figure
Case 1. overlapped image Number of correctly detected object/Number of object 4/9 4/9 3/9 1/9 Fig. 6
Accuracy 95.4~100% 84.9~99.8% 34.8~85.0% 50.6%
Case 3. small and crowed image Number of correctly detected object/Number of object 2/74 0/74 0/74 0/74 Fig. 8
Accuracy 46.2~57.4% - - -

The effect of color images on model performance was also analyzed by comparing the model simulation results with grayscale images. No noticeable difference was observed between color and grayscale images with five and 10 genera data sets (Fig. 10). However, detection performance varied between color and grayscale images with 20 and 30 genera. Color image data sets tend to show better performance as the number of genera increases. This suggests that morphological characteristics may be enough to detect algae objects with a small number of classes, but more various information from color images is helpful for object detection as the number of classes increases.

Fig. 10. Detection confidence ofMicrasteriassp. with color and grayscale images.
../../Resources/kswe/KSWE.2021.37.4.275/PIC3F55.png

4. Conclusion

In this study, an algae image detection model using YOLOv3 was developed. The model was trained by four data sets with five, 10, 20 and 30 genera. The effect of image color on model performance was also compared by training the model with grayscale images of the four data sets.

The PR of the model was more than 0.8 for all four data sets, where the mAP ranged from 41 to 81. The results suggest the practical applicability of the algae detection YOLOv3 model. The model trained by grayscale image shows similar performance for five and 10 genera data sets. On the other hand, models trained by color images show better performance as the number of classes increases, which indicates that more information from color images is required for proper object detection as the number of classes increases.

The analysis of misclassified cases suggests that model accuracy especially decreases when there are morphologically similar algae cells. Model performance also decreased for small or crowded images.

Deep learning models are limited in directly adapting the training process of the model based on existing physical, chemical and biological causality, which is an important characteristic of the black box model. Despite this limitation, advanced deep learning models are already actively applied in various fields and real life and show sufficient performance. Deep learning models are influenced by the composition of the input data, and the performance of the model can be improved through the proper organization of the input data that can reflect various characteristics of the target object. The model developed in this study shows the possibility of using a deep learning model, YOLOv3, for algae image detection. The results provide valuable practical technology for the monitoring and management of algal blooms in rivers and reservoirs. Further analysis of the results, including misclassification, suggests that increased input data with various characteristics, including small and crowded images, would improve model performance and is suggested as a subject of future study.

Acknowledgement

This work was supported by Korea Environment Industry &Technology Institute (KEITI) through Aquatic Ecosystem Conservation Research Program, funded by Korea Ministry of Environment (MOE) (2020003030006) and the Nakdonggang National Institute of Biological Resources (NNIBR), funded by the Ministry of Environment (MOE) of the Republic of Korea (NNIBR202101103).

References

1 
Codd G. A., Morrison L. F., Metcalf J. S., 2005, Cyanobacterial toxins: risk management for health protection, Toxicology and Applied Pharmacology, Vol. 203, pp. 264-272DOI
2 
Girshick R., 2015, Fast r-cnn, Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448DOI
3 
Girshick R., Donahue J., Darrell T., Malik J., 2014, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587DOI
4 
He K., Gkioxari G., Dollár P., Girshick R., 2017, Mask r-cnn, Proceedings of the IEEE International Conference on Computer Vision, pp. 2961-2969DOI
5 
He K., Zhang X., Ren S., Sun J., 2015, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, pp. 1904-1916DOI
6 
Krizhevsky A., Sutskever I., Hinton G. E., 2012, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, Vol. 25, pp. 1097-1105Google Search
7 
LeCun Y., Bengio Y., Hinton G., 2015, Deep learning, Nature, Vol. 521, pp. 436-444DOI
8 
Lin T. Y., Goyal P., Girshick R., He K., Dollár P., 2017, Focal loss for dense object detection, Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988DOI
9 
Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C. Y., Berg A. C., 2016, SSD: Single shot multibox detector, Proceedings of European Conference on Computer Vision, pp. 21-37DOI
10 
Ozenne B., Subtil F., Maucort-Boulch D., 2015, The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases, Journal of Clinical Epidemiology, Vol. 68, pp. 855-859DOI
11 
Paerl H. W., Otten T. G., 2013, Harmful cyanobacterial blooms: causes, consequences, and controls, Microbial Ecology, Vol. 65, pp. 995-1010DOI
12 
Pedraza A., Bueno G., Deniz O., Ruiz-Santaquiteria J., Sanchez C., Blanco S., Borrego-Ramos M., Olenici A., Cristobal G., 2018, Lights and pitfalls of convolutional neural networks for diatom identification, Proceedings of Optics, Photonics, and Digital Technologies for Imaging Applications V, 106790GDOI
13 
Redmon J., Farhadi A., 2017, YOLO9000: Better, faster, stronger, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263-7271DOI
14 
Redmon J., Farhadi A., 2018, Yolov3: An incremental improvement, arXiv preprint arXiv, 1804.02767Google Search
15 
Redmon J., Divvala S., Girshick R., Farhadi A., 2016, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788DOI
16 
Ren S., He K., Girshick R., Sun J., 2015, Faster r-cnn: Towards real-time object detection with region proposal networks, arXiv preprint arXiv, 1506.01497Google Search
17 
Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., 2015, Imagenet large scale visual recognition challenge, International Journal of Computer Vision, Vol. 115, pp. 211-252DOI
18 
Salido J., Sánchez C., Ruiz-Santaquiteria J., Cristóbal G., Blanco S., Bueno G., 2020, A low-cost automated digital microscopy platform for automatic identification of diatoms, Applied Sciences, Vol. 10, pp. 6033DOI
19 
Sultana F., Sufian A., Dutta P., 2020, A review of object detection models based on convolutional neural network, Intelligent Computing: Image Processing Based Applications, pp. 1-16DOI
20 
Tian Y., Yang G., Wang Z., Wang H., Li E., Liang Z., 2019, Apple detection during different growth stages in orchards using the improved YOLO-V3 model, Computers and Electronics in Agriculture, Vol. 157, pp. 417-426DOI
21 
World Health Organization (WHO), 2004, Guidelines for drinking-water quality, World Health Organization, Vol. 1
22 
Zhao K., Ren X., 2019, Small aircraft detection in remote sensing images based on YOLOv3, Proceedings of IOP Conference Series: Materials Science and Engineering, 012056DOI
23 
Zhao Z. Q., Zheng P., Xu S. T., Wu X., 2019, Object detection with deep learning: A review, IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, No. 11, pp. 3212-3232DOI