The Journal of
the Korean Society on Water Environment

The Journal of
the Korean Society on Water Environment

Bimonthly
  • ISSN : 2289-0971 (Print)
  • ISSN : 2289-098X (Online)
  • KCI Accredited Journal

Editorial Office


  1. ๊ตญ๋ฆฝํ•œ๋ฐญ๋Œ€ํ•™๊ต ๊ฑด์„คํ™˜๊ฒฝ๊ณตํ•™๊ณผ (Department of Civil and Environmental Eng, Hanbat National University)



Clustering, Ensemble machine learning, Gradient boosting decision tree, Water quality prediction, Water supply system, XGBoost

1. Introduction

์ทจ์ˆ˜์›์˜ ์•ˆ์ •์  ์ˆ˜์งˆ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•ด์„œ๋Š” ์ˆ˜์งˆํ˜„ํ™ฉ์— ๋Œ€ํ•œ ์ง€์†์ ์ธ ๋ชจ๋‹ˆํ„ฐ๋ง๊ณผ ํ•จ๊ป˜ ์ˆ˜์งˆ์˜ ๋ณ€ํ™”์— ๋Œ€ํ•œ ์˜ˆ์ธก์ด ํ•„์š”ํ•˜๋‹ค. ํ•˜์ฒœ ๋ฐ ์ €์ˆ˜์ง€ ๋“ฑ ์ทจ์ˆ˜์› ์ˆ˜์งˆ์€ ์œ ๊ธฐ๋ฌผ์งˆ ๋ฐ ์˜์–‘์—ผ๋ฅ˜ ๋“ฑ ๋‹ค์–‘ํ•œ ์˜ค์—ผ์›์— ์˜ํ•ด ์˜ํ–ฅ์„ ๋ฐ›๊ฒŒ ๋˜๋ฉฐ ์ˆ˜์ค‘์˜ ๋ถ€์œ ์‚ฌ(suspended sediment)๋„ ์ทจ์ˆ˜์›์˜ ์ˆ˜์งˆ๊ณผ ์ˆ˜์ƒํƒœ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ค‘์š”ํ•œ ์ธ์ž์ค‘ ํ•˜๋‚˜์ด๋‹ค(Packman and MacKay, 2003; Singer et al., 2013). ๋˜ํ•œ ๊ฐ•์šฐ์‹œ ์œ ๋Ÿ‰์ฆ๊ฐ€์— ๋”ฐ๋ฅธ ๋ถ€์œ ์‚ฌ ๋†๋„(suspended sediment concentration, SSC)์˜ ์ฆ๊ฐ€๋Š” ์ทจ์ˆ˜์› ๊ณ ํƒ์ˆ˜์˜ ์›์ธ์ด ๋˜๋ฉฐ ์ •์ˆ˜์ฒ˜๋ฆฌ ๋น„์šฉ์˜ ์ฆ๊ฐ€ ๋ฐ ์ˆ˜์งˆ์‚ฌ๊ณ  ๋ฐœ์ƒ ๋“ฑ ์ •์ˆ˜์ฒ˜๋ฆฌ๊ณต์ •์—๋„ ๋‹ค์–‘ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ฒŒ ๋œ๋‹ค(Lin et al., 2004; Park and Lee, 2020).

์ตœ๊ทผ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•œ ๋ชจํ˜•์˜ ์ ์šฉ์ด ํ™œ๋ฐœํ•˜๊ฒŒ ๋Š˜์–ด๋‚˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋ฌผํ™˜๊ฒฝ๋ถ„์•ผ์—์„œ๋„ ์ด๋Ÿฌํ•œ ๊ณ ๋„ํ™”๋œ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ˆ˜์งˆ ์˜ˆ์ธก ๋ฐ ๊ด€๋ฆฌ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๊ณ„์†๋˜๊ณ  ์žˆ๋‹ค(Haghiabi et al., 2018; Li et al., 2021; Muhammad et al., 2015). ๋Œ€ํ‘œ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ ์ธ๊ณต์‹ ๊ฒฝ๋ง(artificial neural network, ANN) ๋ฟ ์•„๋‹ˆ๋ผ support vector machine (SVM), ensemeble ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ random forest (RF), ๊ทธ๋ฆฌ๊ณ  ๊ธฐ์กด ANN ๋ชจํ˜•์˜ ํ•œ๊ณ„๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ํš๊ธฐ์ ์ธ ๋ฐœ์ „์„ ์ด๋ฃจ์–ด๋‚ธ ๋”ฅ๋Ÿฌ๋‹(deep learning) ๋ชจํ˜•์ค‘ ์‹œ๊ณ„์—ด ์ž๋ฃŒ์˜ ๋ถ„์„์— ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์ˆœํ™˜์‹ ๊ฒฝ๋ง(recurrent neural network) ๊ธฐ๋ฐ˜์˜ long short term memories (LSTM) ๋“ฑ ๋‹ค์–‘ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์ด ํƒ๋„ ์˜ˆ์ธก์— ์ ์šฉ๋˜๋Š” ๋“ฑ ๊ด€๋ จ๋ถ„์•ผ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค(Park and Lee, 2020; Stevenson and Bravo, 2019; Wang et al., 2021).

Ensemble ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์€ weak learner๋กœ ๋ถˆ๋ฆฌ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจํ˜•์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ธก์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ RF์™€ gradient boosting decision tree (GBDT) ๋“ฑ์ด ๋Œ€ํ‘œ์ ์ธ ensemble ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์ด๋‹ค(Sutton, 2005; Zhang, Qian et al., 2018). ๋‘๊ฐ€์ง€ ๋ชจํ˜• ๋ชจ๋‘ ํšŒ๊ท€๋ถ„์„(regression) ๋ฐ ๋ถ„๋ฅ˜(classification) ๋‘๊ฐ€์ง€ ๋ฐฉ์‹ ๋ชจ๋‘์— ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ณ  ์ถฉ๋ถ„ํ•œ ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ํ™•๋ณดํ•  ๊ฒฝ์šฐ ๋†’์€ ์˜ˆ์ธก์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์ตœ๊ทผ๊นŒ์ง€๋„ ๊ฐ€์žฅ ๋„๋ฆฌ ํ™œ์šฉ๋˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, ์ˆ˜์งˆ๋ถ„์•ผ์—๋„ ํ™œ์šฉ์ด ์ ์ฐจ ๋Š˜๊ณ  ์žˆ๋‹ค(Hollister et al., 2016; Uddameri et al., 2020).

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์€ ๋ฌผ๋ฆฌ์  ํ˜น์€ ํ™”ํ•™์  ๊ด€๊ณ„์— ๊ธฐ๋ฐ˜ํ•œ ๋ณ„๋„์˜ ๊ณ„์ˆ˜ ๋“ฑ์„ ๊ตฌํ•˜์ง€ ์•Š์•„๋„ ๋ชจํ˜•์— ์‚ฌ์šฉ๋˜๋Š” ๋…๋ฆฝ๋ณ€์ˆ˜์™€ ๋ณต์žกํ•œ ๋น„์„ ํ˜•๊ด€๊ณ„(non-linear)๋ฅผ ๊ฐ€์ง€๋Š” ์ข…์†๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ๋„ ์ข‹์€ ์˜ˆ์ธก์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์€ ์ž…๋ ฅ์ž๋ฃŒ๋กœ ํ™œ์šฉ๋˜๋Š” ํ•ญ๋ชฉ์˜ ๊ตฌ์„ฑ๊ณผ ์ธก์ •๋นˆ๋„ ๋ฐ ์ ์ •ํ•œ ์ „์ฒ˜๋ฆฌ ๋“ฑ์„ ํฌํ•จํ•˜๋Š” feature engineering์— ์˜ํ•ด ๋งŽ์€ ์˜ํ–ฅ์„ ๋ฐ›๊ฒŒ ๋˜๋ฉฐ, ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ ์ •ํ•œ ์ž…๋ ฅ ๋ณ€์ˆ˜์˜ ๊ตฌ์ถ•์ด ์ค‘์š”ํ•˜๋‹ค(Park, 2021).

์ทจ์ˆ˜์›์œผ๋กœ ํ™œ์šฉ๋˜๋Š” ํ•˜์ฒœ ๋ฐ ์ €์ˆ˜์ง€ ๋“ฑ์—์„œ์˜ ๋ถ€์œ ์‚ฌ ๋†๋„(suspended sediment concentration, SSC)๋Š” ๊ฐ•์šฐ๋Ÿ‰, ์œ ์‚ฌ(sediment) ๋ฐœ์ƒ์›์˜ ํŠน์„ฑ, ์œ ์‚ฌ ๋ฐœ์ƒ์›๊ณผ ์ธก์ •์ง€์ ์˜ ๊ฑฐ๋ฆฌ, ๊ฐ•์šฐ ๋ฐœ์ƒ ์ด์ „์˜ ๋ฌด๊ฐ•์šฐ ์ผ์ˆ˜, ์ตœ๋Œ€ ๊ฐ•์šฐ๊ฐ•๋„ ๋“ฑ ์ž์—ฐ์  ์š”์ธ๊ณผ ํ•จ๊ป˜(Hicks et al., 2000; Park and Hunt, 2017; Warrick, 2015; Warrick et al., 2013) ๊ฑด์„ค๊ณต์‚ฌ, ๋†์—… ํ™œ๋™ ๋“ฑ ์ธ๊ฐ„ํ™œ๋™ ๊ทธ๋ฆฌ๊ณ  ๊ธฐํ›„๋ณ€ํ™” ๋“ฑ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์ธ์ž์— ์˜ํ–ฅ์„ ๋ฐ›๊ฒŒ ๋œ๋‹ค(Gray et al., 2016; Gray et al., 2015). ํ•˜์ฒœ ์œ ๋Ÿ‰(Q)๋Š” SSC์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ธ์ž ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ SSC๋Š” Q์™ธ์—๋„ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํ™˜๊ฒฝ์š”์ธ์— ์˜ํ–ฅ์„ ๋ฐ›์œผ๋ฏ€๋กœ, ๋™์ผ ์žฅ์†Œ์—์„œ ๋™์ผํ•œ Q๊ฐ€ ๋ฐœ์ƒํ•ด๋„ ์—ฐ๋„, ๊ณ„์ ˆ ๋ฐ ์„ ํ–‰ ๊ฐ•์šฐ์กฐ๊ฑด ๋“ฑ์— ๋”ฐ๋ผ SSC๊ฐ€ ํฐ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ธฐ๋„ ํ•˜๊ณ , Q์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊ตฌ๊ฐ„๋ณ„๋กœ SSC์™€ Q์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚˜๊ธฐ๋„ ํ•œ๋‹ค(Walling, 1977; Warrick, 2015).

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ตœ๊ทผ ๊นŒ์ง€๋„ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋Œ€ํ‘œ์ ์ธ ensemble ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์ค‘ ํ•˜๋‚˜์ธ Gradient boosting decision tree (GDBT)๋ฅผ ํ™œ์šฉํ•˜์—ฌ Q๋ฅผ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ์ด์šฉํ•˜์—ฌ SSC๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ๋ชจํ˜•์˜ ๊ตฌ์ถ•์— ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ์ž๋ฃŒ์˜ ํŠน์„ฑ์— ๋”ฐ๋ผ ๊ตฐ์ง‘ํ™”(clustering)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๋น„์ง€๋„ ํ•™์Šต(unsupervised learning) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ธ k-ํ‰๊ท  ๊ตฐ์ง‘ํ™”(k-means clustering, KMC) ๋ชจํ˜•์„ ์ด์šฉํ•˜์—ฌ Q์— ๋”ฐ๋ผ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๊ตฐ์ง‘ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ , GBDT ๋ชจํ˜•์„ ์ด์šฉํ•˜์—ฌ ๊ฐ๊ฐ์˜ ๊ตฐ์ง‘์— ์ตœ์ ํ™”๋œ SSC ์˜ˆ์ธก ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋น„๊ต๋ฅผ ์œ„ํ•˜์—ฌ ๋ณ„๋„์˜ ๊ตฐ์ง‘ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ  ์ „์ฒด์ž๋ฃŒ๋ฅผ ์ž…๋ ฅ์ž๋ฃŒ๋กœ ์ด์šฉํ•˜๋Š” GBDT ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๊ตฐ์ง‘ํ™” ์ˆ˜ํ–‰์—ฌ๋ถ€์— ๋”ฐ๋ฅธ ๋ชจํ˜• ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์—ฌ, ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์ด ๋ชจํ˜• ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•˜์˜€๋‹ค.

2. Materials and Methods

2.1 Data sources

๋ฏธ๊ตญ ์ง€์งˆ์กฐ์‚ฌ๊ตญ(United States Georogical Survey, USGS)์€ ๊ตญํ† ๊ด€๋ฆฌ์™€ ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•ด ๋ฏธ๊ตญ ์ „์—ญ์— ํ˜„์žฅ์ธก์ •์†Œ๋ฅผ ์„ค์น˜ํ•˜์—ฌ ์žฅ๊ธฐ๊ฐ„์— ๊ฑธ์ณ ์œ ๋Ÿ‰๊ณผ SSC๋ฅผ ์ธก์ •ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๊ณต๊ฐœํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” USGS์—์„œ ์šด์˜ํ•˜๋Š” ํ˜„์žฅ์ธก์ •์†Œ ์ค‘ ๋ฏธ๊ตญ Calfironia Reedwood Creek์— ์œ„์น˜ํ•œ 2๊ฐœ ์ง€์ (Blue Lake ๋ฐ Orick)์˜ Q ๋ฐ SSC ์ผ์ผ ์ธก์ •์ž๋ฃŒ๋ฅผ ํ™œ์šฉํ•˜์˜€๋‹ค(Table 1) (USGS, 2014). ๋ฏธ๊ตญ ์„œ๋ถ€์—ฐ์•ˆ์— ์œ„์น˜ํ•œ Redwood Creek์€ ์ง€์ค‘ํ•ด์„ฑ ๊ธฐํ›„ ์ง€์—ญ์— ์†ํ•˜๋ฉฐ 10์›”๊ฒฝ๋ถ€ํ„ฐ ์šฐ๊ธฐ๊ฐ€ ์‹œ์ž‘๋˜์–ด ์ด๋“ฌํ•ด ๋ด„๊นŒ์ง€ ๊ณ„์†๋˜๊ณ  ์ดํ›„ 9์›”๊ฒฝ๊นŒ์ง€ ๊ฑด๊ธฐ๊ฐ€ ์ด์–ด์ง€๋Š” ๊ฐ•์šฐ ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. Orick ์ง€์ ์€ ๊ฐ•ํ•˜๊ตฌ๋กœ๋ถ€ํ„ฐ ์•ฝ 6km ์ƒ๋ฅ˜์— ์œ„์น˜ํ•˜๋ฉฐ ํ•˜์ฒœ์€ Blue Lake์—์„œ Orick์„ ๊ฑฐ์ณ ํƒœํ‰์–‘์œผ๋กœ ์œ ์ž…ํ•˜๊ฒŒ ๋œ๋‹ค(USGS, 2009).

Table 1. Research sites
Sites Watershed area (ใŽข) Location USGS site number Observation period
Latitude Longitude
Blue Lake 175 40โ—ฆ54โ€ฒ22โ€ณ 123โ—ฆ48โ€ฒ51โ€ณ 11481500 Oct 1, 1972-
April 30, 1992
Orick 717 41โ—ฆ17โ€ฒ58โ€ณ 124โ—ฆ03โ€ฒ00โ€ณ 11482500 March 19, 1970-
April 30, 1992

2.2 Model development

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ensemble ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์ธ GBDT ๋ชจํ˜•์„ ์ด์šฉํ•˜์—ฌ ํ•˜์ฒœ์˜ SSC๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. GBDT๋Š” RF์™€ ํ•จ๊ป˜ ๋Œ€ํ‘œ์ ์ธ ensemble ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. RF๋Š” ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด(decision tree, DT)๊ธฐ๋ฐ˜์˜ ๋‹ค์ˆ˜์˜ weak learner๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๊ฐ weak learner์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ์˜ ํ‰๊ท ์„ ์ด์šฉํ•˜์—ฌ ์˜ˆ์ธก๊ฐ’์„ ์‚ฐ์ •ํ•˜๋Š” ๋ฐ˜๋ฉด, GBDT ๋ชจํ˜•์€ ์ „๋‹จ๊ณ„ weak learner์˜ ์˜ˆ์ธก๊ฐ’์„ ๋‹ค์Œ ๋‹จ๊ณ„์˜ weak learner์˜ ๊ตฌ์ถ•์— ํ™œ์šฉํ•˜๋ฉฐ, ์‹ค์ธก๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’ ๊ฐ„์˜ ์ž”์ฐจ๊ฐ€ ๋งŽ์€ ์ž…๋ ฅ์ž๋ฃŒ์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์–ด ๋ชจํ˜•์˜ ํ•™์Šต์„(training) ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋„๋ก ๊ตฌ์„ฑ๋œ ๋ชจํ˜•์ด๋‹ค(Chen and Guestrin, 2016; Friedman, 2001; Zhang, Bouadi et al., 2018).

GBDT ๋ชจํ˜•์€ ์˜ˆ์ธก์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ํ•ญ๋ชฉ์˜ ์‹ค์ธก๊ฐ’(yobs,i)๊ณผ ๋ชจํ˜•์˜ ์˜ˆ์ธก๊ฐ’(ypred,i)์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์†์‹คํ•จ์ˆ˜(L: loss function)์™€, ๊ฐœ๋ณ„ DT ๋ชจํ˜•(fk)์˜ ํ•จ์ˆ˜์ธ regulation ํ•จ์ˆ˜(ฮฉ)๋กœ ๊ตฌ์„ฑ๋œ objective ํ•จ์ˆ˜(J)๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ชจํ˜•์„ ์ตœ์ ํ™”ํ•œ๋‹ค(Eq. 1) (Chen and Guestrin, 2016; Shin et al., 2020; Zhang, Qian et al., 2018). ๋ชจํ˜•์˜ ๊ตฌ์ถ•์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” GBDT ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ธ XGBoost regressor (XGB)๋ฅผ ์ด์šฉํ•˜์˜€์œผ๋ฉฐ, Q๋ฅผ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ํ•˜์—ฌ ์ข…์†๋ณ€์ˆ˜ SSC๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ผ๋‹จ์œ„ ์ž๋ฃŒ์˜ ์ฐจ๋ถ„์„ ์ ์šฉํ•˜์—ฌ ์‹œ๊ฐ„ t์— ๋Œ€ํ•ด์„œ 1์ผ ์ „์˜ Q ๋ฐ SSC์ธ Qt-1๊ณผ SSCt-1์„ ์ž…๋ ฅ์ž๋ฃŒ๋กœ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์— ํ™œ์šฉํ•˜์˜€๋‹ค. ๋ชจํ˜•์˜ ์ตœ์ ํ™”๋Š” grid search ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜์˜€์œผ๋ฉฐ, ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ 10๊ฐœ์˜ set์œผ๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ cross validation์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๋ชจํ˜•์˜ ๊ตฌ์ถ•๊ณผ ์ตœ์ ํ™” ๋“ฑ์€ python open source library์ธ Scikit-learn์„ ์ด์šฉํ•˜์—ฌ ์‹คํ–‰ํ•˜์˜€๋‹ค(Pedregosa et al., 2011).

(1)
J = โˆ‘ i = 1 n L y o b s , i ,   y p r e d , i + โˆ‘ i = 1 K ฮฉ f k

2.3 Clustering of input variables

XGB ๋ชจํ˜•์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๊ตฐ์ง‘ํ™”๋ฅผ ์œ„ํ•ด ๋น„์ง€๋„ ํ•™์Šต ๋ชจํ˜•์ธ KMC์„ ์ด์šฉํ•˜์˜€๋‹ค. KMC๋Š” ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ์‚ฌ์ „์— ๊ฐœ์ˆ˜๊ฐ€ ์ •ํ•ด์ง„ ์ž„์˜์˜ ๊ตฐ์ง‘์— ๋ถ„๋ฅ˜ํ•˜๊ณ  ๊ฐ ๊ตฐ์ง‘์˜ ํ‰๊ท ๊ฐ’(ฮผj)๊ณผ ๊ฐ ์ž…๋ ฅ์ž๋ฃŒ์˜๊ฐ’(../../Resources/kswe/KSWE.2021.37.5.335/PIC82F2.gif) ๊ณผ์˜ ์ฐจ์ด๋ฅผ ์œ ํด๋ฆฌ๋””์–ธ ๊ฑฐ๋ฆฌ(euclidean distance)๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌํ•˜๊ณ  ์ด๋ฅผ ์ตœ์†Œํ™” ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ข…์ ์œผ๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ๊ตฐ์ง‘์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ชจํ˜•์ด๋‹ค(Ahmad and Dey, 2007; Ayub et al., 2016; Song, 2017) (Eq. 2). KMC๋Š” python Scikit-learn library๋ฅผ ์ด์šฉํ•˜์—ฌ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค(Pedregosa et al., 2011).

(2)
โˆ‘ j = 1 k โˆ‘   x i โˆˆ R โˆฅ x i - ฮผ j โˆฅ

2.4 Model evaluation

๊ตฌ์ถ•๋œ XGB ๋ชจํ˜•์„ ์ด์šฉํ•œ SSC ์˜ˆ์ธก์„ฑ๋Šฅ์˜ ํ‰๊ฐ€๋Š” ํ‰๊ท  ์ œ๊ณฑ๊ทผ ์˜ค์ฐจ(root mean square error, RMSE)์™€ ํ‰๊ท  ์ œ๊ณฑ๊ทผ ์˜ค์ฐจ-๊ด€์ธก๊ฐ’ ํ‘œ์ค€ํŽธ์ฐจ๋น„(mean squared error-observation standard deviation Ratio)๋ฅผ ์ด์šฉํ•˜์˜€๋‹ค(Eq. 3 and 4). RMSE๋Š” ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ธก๊ฐ’์˜ ์ฐจ์ด์˜ ์ ˆ๋Œ€์น˜๋ฅผ ๋น„๊ตํ•˜๋Š” ์ง€์ˆ˜๋กœ RMSE๊ฐ€ 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋ชจํ˜•์˜ ์˜ˆ์ธก์„ฑ๋Šฅ์ด ์ข‹์Œ์„ ์˜๋ฏธํ•œ๋‹ค. RSR์€ ๋ชจํ˜•๊ฐ„ ์„ฑ๋Šฅ์˜ ์ ˆ๋Œ€์ ์ธ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•œ ์ง€์ˆ˜๋กœ 0~1์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง€๋ฉฐ ์ผ๋ฐ˜์ ์œผ๋กœ RSR์ด 0.7 ์ดํ•˜์ธ ๊ฒฝ์šฐ ์˜ˆ์ธก์ด ์ž˜ ์ˆ˜ํ–‰๋œ ๊ฒƒ์œผ๋กœ ํŒ๋‹จํ•˜๊ณ , 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•œ ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค(Bennett et al., 2013; Moriasi et al., 2007).

(3)
R S M E = โˆ‘ t = 1 n Y t , o b s - Y t , p r e d 2 n
(4)
R S R = โˆ‘ t = 1 n Y t , o b s - Y t , p r e d 2 โˆ‘ t = 1 n Y t , o b s - Y t , o b s ยฏ 2

where

    ../../Resources/kswe/KSWE.2021.37.5.335/PIC82F3.gif: Observed value at time t,

    ../../Resources/kswe/KSWE.2021.37.5.335/PIC82F4.gif: Predicted value at time t,

    ../../Resources/kswe/KSWE.2021.37.5.335/PIC8305.gif: mean of observed values.

3. Results and Discussion

3.1 Characteristics of input variables and pretreatment of missing variables

๋ชจํ˜•์˜ ๊ตฌ์ถ•์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๊ธฐ์ดˆ ํ†ต๊ณ„๋Ÿ‰์„ Table 2์— ์ œ์‹œํ•˜์˜€๋‹ค. ๋ชจํ˜• ๊ตฌ์ถ•์— ์‚ฌ์šฉ๋œ ์ธก์ •๊ฐ’์€ Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ 15% ๋ฐ 13%์˜ SSC์˜ ๊ฒฐ์ธก์น˜๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์‹œ ์ž…๋ ฅ์ž๋ฃŒ์— ๊ฒฐ์ธก์น˜๊ฐ€ ํฌํ•จ๋˜๋Š” ๊ฒฝ์šฐ ์ด๋ฅผ ์ œ๊ฑฐํ•˜๊ฑฐ๋‚˜, ๋ณด๊ฐ„๋ฒ•์ด๋‚˜ ์ฃผ๋ณ€๊ฐ’๋“ค์˜ ํ‰๊ท ๊ฐ’์„ ์ด์šฉํ•ด์„œ ๊ฒฐ์ธก์น˜๋ฅผ ์ถ”์ •ํ•˜๋Š” k nearest neighbors ๋“ฑ์„ ํ†ตํ•ด ๊ฒฐ์ธก์น˜์— ๋Œ€ํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋ฉฐ, ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ์ ์ •ํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์˜ ์„ ์ •์ด ํ•„์š”ํ•˜๋‹ค.

Table 2. Characteristics of input variables
Site Variables Average Min Max Standard deviation
Blue Lake Q (ใŽฅ/s) 6.97 0.05 236.73 11.87
SSC (mg/L) 117.44 0 11,200 427.69
Orick Q (ใŽฅ/s) 30.60 0.06 1,135.51 56.33
SSC (mg/L) 158.70 0 9,610 474.69

๋ณธ ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ์ž๋ฃŒ๊ฐ€ ์ธก์ •๋œ ๋ถ๋ถ€ California ์ง€์—ญ์€ 10์›”๋ถ€ํ„ฐ ์šฐ๊ธฐ๊ฐ€ ์‹œ์ž‘๋˜๊ณ  ์ด๋“ฌํ•ด 2์›”๊ฒฝ๊นŒ์ง€ ๊ฐ•์šฐ๊ฐ€ ์ง€์†๋˜๊ฒŒ ๋˜๋ฉฐ, ์ดํ›„ ๋ด„๊ณผ ์—ฌ๋ฆ„ ๋™์•ˆ์€ ๊ฐ•์šฐ๊ฐ€ ๊ฑฐ์˜ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ๊ฑด๊ธฐ๊ฐ€ ์ง€์†๋œ๋‹ค. SSC์˜ ๊ฒฐ์ธก์น˜๋Š” ๋Œ€๋ถ€๋ถ„ ์ด๋Ÿฌํ•œ ๊ฑด๊ธฐ์ธ 3~9์›”์ค‘์— ๋ฐœ์ƒํ•˜์˜€์œผ๋ฉฐ, ์ด์‹œ๊ธฐ๋Š” ๊ฐ•์šฐ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•„ ๋‚ฎ์€ Q์™€ SSC๊ฐ€ ์ธก์ •๋˜๋Š” ์‹œ๊ธฐ๋กœ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ณ„๋„์˜ ๊ฒฐ์ธก์น˜ ๋ณด์ •์„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ , ์ž…๋ ฅ์ž๋ฃŒ์—์„œ ๊ฒฐ์ธก๊ฐ’์„ ์ œ๊ฑฐํ•˜๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ด๋Ÿฌํ•œ ๊ฐ•์šฐ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ๊ฑด๊ธฐ๊ฐ€ ๋๋‚˜๊ณ  ์ƒˆ๋กœ์šด ์šฐ๊ธฐ๊ฐ€ ์‹œ์ž‘๋˜๋Š” 10์›”์„ ๊ธฐ์ค€์œผ๋กœ ๋ชจํ˜•์˜ training๊ณผ ์„ฑ๋Šฅ์˜ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ testing์— ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๊ตฌ๋ถ„ํ•˜์—ฌ, Blue Lake๋Š” 1985๋…„ 10์›” 1์ผ ์ดํ›„์˜ ์ž๋ฃŒ๋ฅผ Orick์—์„œ๋Š” 1984๋…„ 10์›” 1์ผ ์ดํ›„์˜ ์ž๋ฃŒ๋ฅผ testing์— ํ™œ์šฉํ•˜์˜€๋‹ค(Fig. 1). ๋ชจํ˜•์˜ training๊ณผ testing์—๋Š” ๊ฒฐ์ธก์น˜๋ฅผ ์ œ์™ธํ•˜๊ณ  Blue Lake์—์„œ๋Š” ๊ฐ๊ฐ 4,271์ผ ๋ฐ 1,792์ผ, Orick์—์„œ๋Š” ๊ฐ๊ฐ 4,853์ผ ๋ฐ 2,157์ผ ๊ฐ„ ์ธก์ •๋œ ๊ฐ’์ด ์‚ฌ์šฉ๋˜์–ด, training๊ณผ testing์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๋น„์œจ์€ Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ 0.70:0.30 ๋ฐ 0.69:0.31๋กœ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค.

Fig. 1. Training and testing data.
../../Resources/kswe/KSWE.2021.37.5.335/PIC8315.png

3.2 Clustering of input SSC

KMC๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์— ์‚ฌ์šฉ๋œ training ์ž๋ฃŒ๋ฅผ Q๊ฐ€ ๋‚ฎ์€ ๊ตฐ์ง‘๊ณผ(Class 1), ๋†’์€ ๊ตฐ์ง‘(Class 2)์˜ 2๊ฐœ์˜ ๊ตฐ์ง‘์œผ๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ XGB ๋ชจํ˜•์— ์ ์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ตฐ์ง‘ํ™”๋ฅผ ํ•˜์ง€ ์•Š์€ ์ „์ฒด์ž๋ฃŒ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌ์ถ•๋œ ๋ชจํ˜•๊ณผ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์ด ๋ชจํ˜• ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•˜์˜€๋‹ค(Table 3 and Fig. 2).

Table 3. Clustering of input variables for the model training
Site Blue lake Orick
Class Class 1
(low range)
Class 2
(high range)
Class 1
(low range)
Class 2
(high range)
Max Q (ใŽฅ/s) 22.6 236.7 96.0 1135.5
Number of observation 3,923 348 4,393 460
Fig. 2. Distribution of clustered input variables for the model training.
../../Resources/kswe/KSWE.2021.37.5.335/PIC8355.png

3.3 Model simulation result

Model 1. Separated model

๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•ด Q๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ตฌ๋ถ„๋œ ๋‚ฎ์€ Q ๊ฐ’์„ ๊ฐ€์ง€๋Š” Class 1๊ณผ ๋†’์€ Q ๊ฐ’์„ ๊ฐ€์ง€๋Š” Class 2 ๊ฐ๊ฐ์— ๋Œ€ํ•˜์—ฌ ๋ณ„๋„์˜ training์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ๋ชจํ˜•์˜ testing์€ ๊ฐ testing ์ž๋ฃŒ๊ฐ€ ํ•ด๋‹น๋˜๋Š” Class์—์„œ ๊ตฌ์ถ•๋œ ๋ชจํ˜•์„ ์ ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค.

Model 2. Combined model

๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•ด ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•œ Model 1๊ณผ์˜ ๋น„๊ต๋ฅผ ์œ„ํ•ด Q์— ๋”ฐ๋ฅธ ๊ตฌ๋ถ„ ์—†์ด ์ „์ฒด ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ training ๋ฐ testing์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค.

๊ตฌ์ถ•๋œ ๋ชจํ˜•์˜ testing ๊ฒฐ๊ณผ Blue Lake์™€ Orick ๋‘์ธก์ •์ง€์  ๋ชจ๋‘์—์„œ, ๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•ด ๋‚ฎ์€ Q์™€ ๋†’์€ Q ๊ตฌ๊ฐ„์— ๋Œ€ํ•˜์—ฌ ๋ณ„๋„์˜ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ Model 1์ด ์ „์ฒด์ž๋ฃŒ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌ์ถ•๋œ Model 2๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, ์ž…๋ ฅ์ž๋ฃŒ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ๋ชจํ˜• ๊ตฌ์ถ•์„ ํ†ตํ•ด XGB ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค(Fig. 3).

Fig. 3. A comparison of model evaluation results.
../../Resources/kswe/KSWE.2021.37.5.335/PIC8365.png

Q์˜ ๋ฒ”์œ„์™€ ์ƒ๊ด€์—†์ด ์ „์ฒด ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๋ชจ๋‘ ์ด์šฉํ•˜์—ฌ ๊ตฌ์ถ•ํ•œ Model 2์˜ ๊ฒฝ์šฐ Blue Lake์™€ Orick์—์„œ RSR์ด ๊ฐ๊ฐ 0.51๊ณผ 0.57๋กœ ๋ถ„์„๋˜์—ˆ์œผ๋‚˜, Model 1์˜ RSR์€ Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ 0.46 ๋ฐ 0.55๋กœ ๋ถ„์„๋˜์–ด ๊ฐœ์„ ๋œ SSC ์˜ˆ์ธก์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. RMSE๋Š” Blue Lake์™€ Orick์—์„œ Model 2์˜ ๊ฒฝ์šฐ ๊ฐ๊ฐ 117.10๊ณผ 124.04๋กœ Model 1์˜ ๊ฒฝ์šฐ ๊ฐ๊ฐ 104.05์™€ 118.95๋กœ ๋ถ„์„๋˜์–ด, RSR๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋‘์ง€์  ๋ชจ๋‘์—์„œ Model 1์„ ์‚ฌ์šฉํ•  ๋•Œ ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋˜์—ˆ๋‹ค.

๋ชจํ˜•์˜ ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌ์ถ•๋œ Model 1๊ณผ Model 2์˜ testing ์ž๋ฃŒ์— ๋Œ€ํ•œ ์‹ค์ธก๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์„ ๋น„๊ตํ•˜์—ฌ Fig. 4์— ์ œ์‹œํ•˜์˜€๋‹ค. Fig. 4์˜ ๊ฒ€์€์ƒ‰ ์›์€ Model 1์˜ ๋‚ฎ์€ Q ๊ตฌ๊ฐ„์— ๋Œ€ํ•˜์—ฌ, ํŒŒ๋ž€์ƒ‰ ์‚ฌ๊ฐํ˜•์€ Model 1์˜ ๋†’์€ Q ๊ตฌ๊ฐ„์— ๋Œ€ํ•ด์„œ ๊ฐ๊ฐ ์ตœ์ ํ™”๋œ ๋ชจํ˜•์˜ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ธก๊ฐ’์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๋นจ๊ฐ„์ƒ‰ ์‚ผ๊ฐํ˜•์€ ์ „์ฒด์ž๋ฃŒ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌ์ถ•ํ•œ Model 2์˜ ๋ชจํ˜•์„ ํ†ตํ•œ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ธก๊ฐ’์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. Blue Lake์™€ Orick ๋‘์ธก์ •์ง€์  ๋ชจ๋‘์—์„œ ๋‚ฎ์€ Q์™€ ๋†’์€ Q์˜ ๊ตฌ๊ฐ„์— ๋Œ€ํ•˜์—ฌ ๊ฐ๊ฐ ์ตœ์ ํ™”๋œ Model 1์ด Model 2์— ๋น„ํ•ด 1:1 ์„ ์— ์ƒ๋Œ€์ ์œผ๋กœ ๊ทผ์ ‘ํ•˜์—ฌ ๋ถ„ํฌํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

Fig. 4. A comparison of model predictions.
../../Resources/kswe/KSWE.2021.37.5.335/PIC83A5.png

3.4 Comparision with arbitrarily separated model

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ž๋ฃŒ์˜ ํŠน์„ฑ์— ๋”ฐ๋ผ ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•ด KMC๋ฅผ ์ ์šฉํ•˜์˜€์œผ๋ฉฐ, KMC๋ฅผ ์ด์šฉํ•˜์ง€ ์•Š๊ณ  ์ž„์˜๋กœ ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๊ตฌ๋ถ„ํ•˜์—ฌ ๊ตฌ์ถ•๋œ ๋ชจํ˜•๊ณผ์˜ ๋น„๊ต๋ฅผ ํ†ตํ•ด KMC์˜ ์ ์šฉ์— ๋”ฐ๋ฅธ ๋ชจํ˜• ์„ฑ๋Šฅ ๊ฐœ์„  ํšจ๊ณผ๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค.

Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ Q=2.6 ใŽฅ/s ๋ฐ Q=10.5ใŽฅ/s๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๊ตฌ๋ถ„ํ•œ ๊ฒฐ๊ณผ ๊ฐ ์ง€์ ์—์„œ ์ „์ฒด training์— ์‚ฌ์šฉ๋œ ์ž๋ฃŒ์˜ 50%๊ฐ€ ๋†’์€ Q์™€ ๋‚ฎ์€ Q ๊ตฌ๊ฐ„์— ๊ฐ๊ฐ ๋ถ„ํฌํ•˜๋„๋ก ๊ตฌ๋ถ„์ด ๋˜์—ˆ๋‹ค. ์ดํ›„ Model 1๊ณผ ์œ ์‚ฌํ•œ ๋ฐฉ์‹์œผ๋กœ ์ƒ์œ„ 50%์™€ ํ•˜์œ„ 50% Q์— ํ•ด๋‹น๋˜๋Š” ๊ตฌ๊ฐ„์— ๋Œ€ํ•ด์„œ ๊ฐ๊ฐ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•˜์˜€๋‹ค. ๋ถ„์„๊ฒฐ๊ณผ Blue Lake์˜ ๊ฒฝ์šฐ RMSE์™€ RSR์ด ๊ฐ๊ฐ 112.35์™€ 0.49๋กœ ๋ถ„์„๋˜์–ด, ์ „์ฒด์ž๋ฃŒ๋ฅผ ์‚ฌ์šฉํ•œ Model 2์— ๋น„ํ•ด์„œ๋Š” ๊ฐœ์„ ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋‚˜ KMC๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฐ์ง‘ํ™”๋œ ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌ์ถ•๋œ Model 1์— ๋น„ํ•ด์„œ๋Š” ๋‚ฎ์€ ์„ฑ๋Šฅ ๊ฐœ์„ ํšจ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. Orick์˜ ๊ฒฝ์šฐ RMSE์™€ RSR์ด ๊ฐ๊ฐ 124.73๊ณผ 0.57๋กœ ์ „์ฒด ์ž๋ฃŒ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฌ์ถ•๋œ Model 2์™€ ์œ ์‚ฌํ•œ ๋ชจํ˜• ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋ชจํ˜•์˜ training์— ์‚ฌ์šฉ๋œ ์ž๋ฃŒ๋ฅผ ์ƒ์œ„ 50% ๋ฐ ํ•˜์œ„ 50%๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•œ ๊ฒฐ๊ณผ, ์ „์ฒด์ž๋ฃŒ๋ฅผ ์ ์šฉํ•˜๋Š” ๋ชจํ˜•์— ๋น„ํ•ด ๋‹ค์†Œ ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋˜๊ฑฐ๋‚˜ ๊ฑฐ์˜ ๊ฐœ์„ ๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ๋ถ„์„๋˜์–ด, KMC๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ๊ตฐ์ง‘ํ™”ํ•˜์—ฌ ๋ชจํ˜•์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒฝ์šฐ์™€ ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

3.5 Optimal clustering

Elbow ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ KMC๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ตฐ์ง‘์ˆ˜ k๋ฅผ ๋Š˜๋ ค๊ฐ€๋ฉด์„œ ๊ฐ k์—์„œ์˜ ์˜ค์ฐจ์˜ ์ œ๊ณฑํ•ฉ(sum of squared error, SE)๋ฅผ ๊ตฌํ•˜๊ณ  k์˜ ์ฆ๊ฐ€์— ๋”ฐ๋ฅธ SE์˜ ๊ฐ์†Œ์œจ์ด ์ ์–ด์ง€๋Š” ์ง€์ ์„ ์ตœ์ ์˜ ๊ตฐ์ง‘์ˆ˜๋กœ ๊ฒฐ์ •ํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ์˜ ์ตœ์  ๊ตฐ์ง‘์ˆ˜๋ฅผ ์‚ฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค(Park, 2018; Zhang, Bouadi et al., 2018). ๋ชจํ˜• ๊ตฌ์ถ•์— ์‚ฌ์šฉ๋œ ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ์ตœ์ ๊ตฐ์ง‘์ˆ˜๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด elbow ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•œ ์ตœ์ ๊ตฐ์ง‘์ˆ˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด KMC๋ฅผ ์ด์šฉํ•˜์—ฌ training์— ์‚ฌ์šฉ๋œ ์ž๋ฃŒ์˜ Q๋ฅผ ๊ธฐ์ค€์œผ๋กœ 1~10๊ฐœ๋กœ ๊ตฐ์ง‘์ˆ˜ k๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ๊ฐ€๋ฉด์„œ SE์˜ ๋ณ€ํ™”๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ 2๊ฐœ์˜ ๊ตฐ์ง‘์œผ๋กœ ๊ตฌ๋ถ„ํ•œ ๊ฒฝ์šฐ SE๊ฐ€ ์ดˆ๊ธฐ๊ฐ’์˜ ์ ˆ๋ฐ˜ ์ดํ•˜๋กœ ๊ธ‰๊ฒฉํžˆ ๊ฐ์†Œํ•˜์˜€์œผ๋ฉฐ ์ดํ›„ ๊ตฐ์ง‘์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ SE๊ฐ€ ์ง€์†์ ์œผ๋กœ ๊ฐ์†Œํ•˜์˜€์œผ๋‚˜ ๊ตฐ์ง‘์ˆ˜ k=6 ์ดํ›„ SE ๋ณ€ํ™”์œจ์ด ํฌ์ง€ ์•Š์•„ ์ตœ์ ์˜ ๊ตฐ์ง‘์ˆ˜๋Š” 6๊ฐœ ๋‚ด์™ธ ์ •๋„์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค(Fig. 5).

Fig. 5. Result of the elbow analysis.
../../Resources/kswe/KSWE.2021.37.5.335/PIC83F4.png

KMC์„ ํ†ตํ•ด ๋ชจํ˜• ๊ตฌ์ถ•์— ์‚ฌ์šฉ๋œ training ์ž๋ฃŒ๋ฅผ k=3~6๊ฐœ์˜ ๊ตฐ์ง‘์œผ๋กœ ๋‚˜๋ˆˆ ๊ฒฐ๊ณผ ๊ฐ€์žฅ ๋‚ฎ์€ Q ๋ฒ”์œ„์— ๊ฐ€์žฅ ๋งŽ์€ ์ž๋ฃŒ๊ฐ€ ๋ถ„๋ฅ˜๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ• ์ˆ˜ ์žˆ์—ˆ๋‹ค(Fig. 6). ๊ฐ ๊ตฐ์ง‘๋ณ„๋กœ ๋ถ„๋ฅ˜๋œ ์ž๋ฃŒ์˜ ๋น„์œจ์€ ๊ตฐ์ง‘์ˆ˜ k์— ๋”ฐ๋ผ ์ฐจ์ด๊ฐ€ ์žˆ์—ˆ๋‹ค. ๊ตฐ์ง‘์ˆ˜ k=3์ผ ๊ฒฝ์šฐ Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ ์ „์ฒด์ž๋ฃŒ์˜ 78% ๋ฐ 85%๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ Q์˜ ๋ฒ”์œ„๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ์œผ๋ฉฐ, ๊ตฐ์ง‘์ˆ˜๊ฐ€ ์ปค์ง์— ๋”ฐ๋ผ ๊ทธ ๋น„์œจ์ด ์ค„์–ด๋“ค์–ด k=6์ผ ๊ฒฝ์šฐ Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ ์ „์ฒด์ž๋ฃŒ์˜ 62% ๋ฐ 67%๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ Q์˜ ๋ฒ”์œ„๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ๋‹ค. ๊ฐ€์žฅ ๋†’์€ Q์˜ ๋ฒ”์œ„์—๋Š” ๊ฐ€์žฅ ์ž‘์€ ์ˆ˜์˜ ์ž๋ฃŒ๊ฐ€ ๋ถ„ํฌํ•˜์—ฌ k=3์ผ ๊ฒฝ์šฐ Blue Lake์—๋Š” 134์ผ, Orick์—๋Š” 50์ผ๊ฐ„์˜ ์ธก์ •์ž๋ฃŒ๊ฐ€, k=6์ผ ๊ฒฝ์šฐ Blue Lake์—์„œ๋Š” 1์ผ Orick์—๋Š” 4์ผ๊ฐ„์˜ ์ธก์ •์ž๋ฃŒ๊ฐ€ ๊ฐ€์žฅ ๋†’์€ Q๊ตฌ๊ฐ„์œผ๋กœ ๋ถ„๋ฅ˜๋˜์—ˆ๋‹ค. ๊ตฐ์ง‘์ˆ˜๋ฅผ 3๊ฐœ ์ด์ƒ์œผ๋กœ ์ง„ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์—๋Š” ์ž๋ฃŒ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์•„ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ถ”๊ฐ€์ ์ธ ๊ตฐ์ง‘๋ณ„ ๋ชจํ˜•๊ตฌ์ถ•์€ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์•˜๋‹ค.

Fig. 6.

Distribution of the clustered training data in Redwood Creek at Orick, California USA.

*Note: Each color represents different cluster.

../../Resources/kswe/KSWE.2021.37.5.335/PIC8434.png

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์€ ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์— ๋‹ค์–‘ํ•œ ์˜ํ–ฅ์„ ๋ฐ›๊ฒŒ ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ตฐ์ง‘ํ™” ๋ชจํ˜•์„ ์ด์šฉํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•œ ๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์ธํ•˜์˜€๋‹ค. ํ–ฅํ›„ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๋‹ค์–‘ํ•œ ํŠน์„ฑ์˜ ๋ฐ˜์˜์„ ํ†ตํ•ด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์ง€์†์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋œ๋‹ค.

4. Conclusion

๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” KMC์„ ์ด์šฉํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์— ๋”ฐ๋ฅธ ๊ตฐ์ง‘ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  XGB๋ฅผ ์ด์šฉํ•˜์—ฌ SSC๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจํ˜•(Model 1)์„ ๊ตฌ์ถ•ํ•˜๊ณ  ์ž…๋ ฅ์ž๋ฃŒ์˜ ๊ตฐ์ง‘ํ™”๊ฐ€ ๋ชจํ˜•์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•˜์˜€๋‹ค. ๋ชจํ˜•์˜ ๊ตฌ์ถ•์—๋Š” ๋ฏธ๊ตญ California Redwood Creek์— ์œ„์น˜ํ•œ USGS ํ˜„์žฅ์ธก์ •์†Œ Blue Lake์™€ Orick 2๊ฐœ์†Œ์—์„œ ์žฅ๊ธฐ๊ฐ„ ์ธก์ •๋œ Q์™€ SSC ์ผ์ผ ์ธก์ •์ž๋ฃŒ๋ฅผ ํ™œ์šฉํ•˜์˜€๋‹ค. ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์€ RMSE ๋ฐ RSR์„ ์ด์šฉํ•˜์—ฌ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ๋น„๊ต๋ฅผ ์œ„ํ•˜์—ฌ ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ๊ตฐ์ง‘ํ™”๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š๊ณ  ์ „์ฒด ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ ์‚ฌ์šฉํ•œ ๋ชจํ˜•(Model 2)๋ฅผ ๊ตฌ์ถ•ํ•˜์—ฌ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•˜์˜€๋‹ค.

๋ชจํ˜•์˜ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์€ Model 2๋Š” Blue Lake์™€ Orick ๊ฐ๊ฐ์—์„œ RSR์ด 0.51 ๋ฐ 0.57๋กœ ๋ถ„์„๋˜์—ˆ์œผ๋ฉฐ, ๊ตฐ์ง‘ํ™”๋ฅผ ํ†ตํ•ด ์ž…๋ ฅ์ž๋ฃŒ๋ฅผ Q๊ฐ€ ๋‚ฎ์€ ๊ฒฝ์šฐ์™€ ๋†’์€ ๊ฒฝ์šฐ์˜ 2๊ฐœ ๊ตฐ์ง‘์œผ๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ๊ฐ๊ฐ์˜ ์ž…๋ ฅ์ž๋ฃŒ์— ์ตœ์ ํ™”์‹œํ‚จ Model 1์˜ ๊ฒฝ์šฐ RSR์ด Blue Lake์™€ Orick์—์„œ ๊ฐ๊ฐ 0.46๊ณผ 0.55๋กœ ๊ฐœ์„ ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. RMSE๋„ RSR๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Model 1์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๊ฒƒ์œผ๋กœ ๋ถ„์„๋˜์–ด, ์ž…๋ ฅ์ž๋ฃŒ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ๋ชจํ˜•์˜ ๊ตฌ์ถ•์„ ํ†ตํ•ด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋˜๋Š” ์‚ฌ๋ก€๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ํ–ฅํ›„ ์ž…๋ ฅ์ž๋ฃŒ์˜ ๋‹ค์–‘ํ•œ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจํ˜•์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ง€์†์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•  ๊ฒƒ์œผ๋กœ ์ƒ๊ฐ๋œ๋‹ค.

Acknowledgement

๋ณธ ๋…ผ๋ฌธ์€ 2021๋…„๋„ ์ •๋ถ€(๊ตญํ† ๊ตํ†ต๋ถ€)์˜ ์žฌ์›์œผ๋กœ ๊ตญํ† ๊ตํ†ต๊ณผํ•™๊ธฐ์ˆ ์ง„ํฅ์›์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค(21UGCP- B157942-02).

References

1 
Ahmad A., Dey L., 2007, A k-mean clustering algorithm for mixed numeric and categorical data, Data & Knowledge Engineering, Vol. 63, pp. 503-527DOI
2 
Ayub J., Ahmad J., Muhammad J., Aziz L., Ayub S., Akram U., Basit I., 2016, Glaucoma detection through optic disc and cup segmentation using k-mean clustering, 2016 International Conference on Computing, Electronic and Electrical Engineering (ICE Cube), pp. 143-147Google Search
3 
Bennett N. D., Croke B. F., Guariso G., Guillaume J. H., Hamilton S. H., Jakeman A. J., Marsili-Libelli S., Newham L. T., Norton J. P., Perrin C., 2013, Characterising performance of environmental models, Environmental Modelling & Software, Vol. 40, pp. 1-20DOI
4 
Chen T., Guestrin C., 2016, Xgboost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16), Association for Computing Machinery, pp. 785-794Google Search
5 
Friedman J. H., 2001, Greedy function approximation: A gradient boosting machine, Annals of statistics, Vol. 29, No. 5, pp. 1189-1232Google Search
6 
Gray A. B., Pasternack G. B., Watson E. B., Goni M. A., Hatten J. A., Warrick J. A., 2016, Conversion to drip irrigated agriculture may offset historic anthropogenic and wildfire contributions to sediment production, Science of the Total Environment, Vol. 556, pp. 219-230Google Search
7 
Gray A. B., Pasternack G. B., Watson E. B., Warrick J. A., Goรฑi M. A., 2015, The effect of El Niรฑo Southern Oscillation cycles on the decadal scale suspended sediment behavior of a coastal dryโ€summer subtropical catchment, Earth Surface Processes and Landforms, Vol. 40, pp. 272-284Google Search
8 
Haghiabi A. H., Nasrolahi A. H., Parsaie A., 2018, Water quality prediction using machine learning methods, Water Quality Research Journal, Vol. 53, pp. 3-13DOI
9 
Hicks D. M., Gomez B., Trustrum N. A., 2000, Erosion thresholds and suspended sediment yields, Waipaoa river basin, New Zealand, Water Resources Research, Vol. 36, pp. 1129-1142DOI
10 
Hollister J. W., Milstead W. B., Kreakie B. J., 2016, Modeling lake trophic state: A random forest approach, Ecosphere, Vol. 7, pp. e01321Google Search
11 
Li L., Rong S., Wang R., Yu S., 2021, Recent advances in artificial intelligence and machine learning for nonlinear relationship analysis and process control in drinking water treatment: A review, Chemical Engineering Journal, Vol. 405, pp. 126673Google Search
12 
Lin W., Sung S., Chen L., Chung H., Wang C., Wu R., Lee D., Huang C., Juang R., Peng X., 2004, Treating high-turbidity water using full-scale floc blanket clarifiers, Journal of Environmental Engineering, Vol. 130, No. 12, pp. 1481-1487DOI
13 
Moriasi D. N., Arnold J. G., Van Liew M. W., Bingner R. L., Harmel R. D., Veith T. L., 2007, Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, Transactions of the American Society of Agricultural and Biological Engineers, Vol. 50, No. 3, pp. 885-900Google Search
14 
Muhammad S. Y., Makhtar M., Rozaimee A., Aziz A. A., Jamal A. A., 2015, Classification model for water quality using machine learning techniques, International Journal of software engineering and its applications, Vol. 9, pp. 45-52DOI
15 
Packman A. I., MacKay J. S., 2003, Interplay of streamโ€subsurface exchange, clay particle deposition, and streambed evolution, Water Resources Research, Vol. 39, No. 4, pp. 1097DOI
16 
Park J., 2021, Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river, [Korean Literature], Journal of Korean Society of Water and Wastewater, Vol. 35, pp. 83-91DOI
17 
Park J., Hunt J. R., 2017, Coupling fine particle and bedload transport in gravel-bedded streams, Journal of Hydrology, Vol. 552, pp. 532-543DOI
18 
Park J., Lee H., 2020, Prediction of high turbidity in rivers using LSTM algorithm, [Korean Literature], Journal of Korean Society of Water and Wastewater, Vol. 34, pp. 35-43DOI
19 
Park R. K., 2018, An empirical comparison and verification study on the containerports clustering measurement using k-means and hierarchical clustering (average linkage method Using Cross-Efficiency Metrics, and Ward Method) and Mixed Models, [Korean Literature], Journal of Korea Port Economic Association, Vol. 34, pp. 17-52DOI
20 
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., 2011, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, Vol. 12, pp. 2825-2830Google Search
21 
Shin Y., Kim T., Hong S., Lee S., Lee E., Hong S., Lee C., Kim T., Park M. S., Park J., 2020, Prediction of chlorophyll-a concentrations in the Nakdong river using machine learning methods, Water, Vol. 12, pp. 1822Google Search
22 
Singer M. B., Aalto R., James L. A., Kilham N. E., Higson J. L., Ghoshal S., 2013, Enduring legacy of a toxic fan via episodic redistribution of California gold mining debris, Proceedings of the National Academy of Sciences, Vol. 110, pp. 18436-18441DOI
23 
Song J., 2017, K-means cluster analysis for missing data, [Korean Literature], Journal of Korean Data Analysis Society, Vol. 19, pp. 689-697Google Search
24 
Stevenson M., Bravo C., 2019, Advanced turbidity prediction for operational water supply planning, Decision Support Systems, Vol. 119, pp. 72-84DOI
25 
Sutton C. D., 2005, Classification and regression trees, bagging, and boosting, Handbook of statistics, Vol. 24, pp. 303-329Google Search
26 
Uddameri V., Silva A. L. B., Singaraju S., Mohammadi G., Hernandez E. A., 2020, Tree-based modeling methods to predict nitrate exceedances in the Ogallala aquifer in Texas, Water, Vol. 12, pp. 1023Google Search
27 
United States Geological Survey (USGS), 2009, USGS(United States Geological Survey) Water-Data Report 2009, 11482500 Redwood Creek at Orick, CAGoogle Search
28 
United States Geological Survey (USGS), 2014, https://waterdata.usgs.gov/nwis (accessed Jun. 2014), National Water Information System (NWIS)
29 
Walling D., 1977, Assessing the accuracy of suspended sediment rating curves for a small basin, Water Resources Research, Vol. 13, No. 3, pp. 531-538DOI
30 
Wang Y., Chen J., Cai H., Yu Q., Zhou Z., 2021, Predicting water turbidity in a macro-tidal coastal bay using machine learning approaches, Estuarine, Coastal and Shelf Science, Vol. 252, pp. 107276DOI
31 
Warrick J. A., 2015, Trend analyses with river sediment rating curves, Hydrological processes, Vol. 29, No. 6, pp. 936-949DOI
32 
Warrick J. A., Madej M. A., Goรฑi M., Wheatcroft R., 2013, Trends in the suspended-sediment yields of coastal rivers of northern California, 1955โ€“2010, Journal of Hydrology, Vol. 489, pp. 108-123DOI
33 
Zhang D., Qian L., Mao B., Huang C., Huang B., Si Y., 2018, A data-driven design for fault detection of wind turbines using random forests and XGboost, IEEE Access, Vol. 6, pp. 21020-21031Google Search
34 
Zhang Y., Bouadi T., Martin A., 2018, An empirical study to determine the optimal k in Ek-NNclus method, 5th International Conference on Belief Functions (BELIEF2018), pp. 260-268Google Search