Title |
Strategies to Enhance Performance in Machine Learning-based Construction Duration Estimation Through Imputation of Missing Data in Actual Construction Duration Dataset |
Authors |
이하늘(Lee, Ha-Neul) ; 강윤호(Kang, Yun-Ho) ; 윤영채(Yun, Yeong-Chae) ; 윤석헌(Yun, Seok-Heon) |
DOI |
https://doi.org/10.5659/JAIK.2024.40.2.267 |
Abstract |
In construction projects, the timeframe often relies on the project manager's experience or past construction records rather than a quantitative
workload analysis. Accurate predictions necessitate estimating based on actual construction duration data, factoring in the workload. However,
integrating construction duration predictions into machine learning models requires extensive big data, and missing data is a common
challenge. This study aims to enhance the learning performance of construction duration prediction models by employing and comparing
various imputation methods in the data preprocessing stage. Suitable imputation methods were proposed for machine learning model training
based on the average error rate. Results showed that the median imputation method was the most fitting single imputation method, while the
random forest regression imputation method stood out among multiple imputation methods. Additionally, with an increasing volume of data,
regression imputation methods within multiple imputation proved more suitable than single imputation methods. |