| Title |
Development of an AI-Driven Imputation Model for Floor Area Ratio Missing Data Incorporating Parcel-Level Spatial, Temporal, and Land Use Variables |
| Authors |
박동준(Park, Dongjoon) ; 이선재(Lee, Sunjae) ; 강범준(Kang, Bumjoon) |
| DOI |
https://doi.org/10.5659/JAIK.2025.41.12.35 |
| Keywords |
Building Registry; Data Imputation; Random Forest; Integrated Nested Laplace Approximation; FAR |
| Abstract |
The National Building Registries of South Korea include 5.8 million building records, with 52.7 percent missing Floor Area Ratio (FAR)
data. These missing values are likely systematic, resulting from multiple administrative digitalization processes in the 1990s, and are classified
as Missing Not at Random (MNAR). Among the missing FAR cases, 96.2 percent also have missing data in other variables such as lot area,
making simple imputation methods ineffective. This study develops adaptive imputation methods using Random Forest Integrated Nested
Laplace Approximation (RF-INLA) models that incorporate spatial, temporal, and land-use factors under different data availability conditions.
For model development and validation, separate training and validation sets were used, each including 100,800 buildings stratified across 252
administrative districts. When lot area data are available, a standalone Random Forest model achieves 70.4 percent accuracy within a 10
percent error margin. Without lot area data, the standalone model shows a mean absolute percentage error (MAPE) of 32.9 percent, while the
proposed RF-INLA model improves this to 26.8 percent. Spatial and temporal usage factors contribute to 32.4 percent of total feature
importance. The integrated approach is especially effective for industrial land uses, increasing R² by 186.6 percent for warehouses and 92.9
percent for factories, with a smaller improvement of 1.6 percent for residential areas. |