The Journal of
the Korean Society on Water Environment

The Journal of
the Korean Society on Water Environment

Bimonthly
  • ISSN : 2289-0971 (Print)
  • ISSN : 2289-098X (Online)
  • KCI Accredited Journal

Editorial Office

Title The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction
Authors 박정수 ( Jungsu Park )
DOI https://doi.org/10.15681/KSWE.2021.37.5.335
Page pp.335-343
ISSN 2289-0971
Keywords Clustering; Ensemble machine learning; Gradient boosting decision tree; Water quality prediction; Water supply system; XGBoost
Abstract Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-observation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.