• 대한전기학회
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • 한국과학기술단체총연합회
  • 한국학술지인용색인
  • Scopus
  • crossref
  • orcid

  1. (Dept. of ICT Convergence, Soonchunhyang University, Republic of Korea. E-mail : wooni3804@sch.ac.kr)



Real estate price forecasting, Time-series forecasting, Korean housing, Data reconstruction, Deep learning

1. Introduction

Real estate markets play a crucial role in national and regional economies, influencing investment decisions, policy formulation, financial risk management, and regional economic disparities [1]. As housing prices directly affect household wealth and borrowing capacity, accurate forecasting of real estate price dynamics has become increasingly important. However, real estate price forecasting remains challenging due to the nature of transaction data. Unlike conventional financial time-series, real estate transactions are recorded only when they occur, resulting in irregular observation intervals and extended periods without observations. This event-driven structure produces discontinuous time-series, where underlying price dynamics evolve continuously but are not directly observed. Consequently, model performance is often constrained not by model capacity but by the quality and structure of the input data.

The Korean real estate market provides a suitable setting for addressing this issue. A nationwide transaction price disclosure system enables large-scale analysis at the individual asset level, while the market is largely centered on apartment complexes, where units within the same complex share similar physical and locational characteristics [2]. In addition, housing prices exhibit regional co-movement, indicating that price dynamics are influenced by both temporal trends and spatial interactions.

Despite these advantages, the event-driven structure of transaction data still introduces fundamental challenges for time-series modeling. Most forecasting models assume regularly spaced observations, but irregular intervals disrupt temporal continuity and hinder the learning of stable patterns and long-term dependencies. Existing approaches, such as removing incomplete observations or applying interpolation, partially address this issue but often fail to capture underlying market dynamics effectively.

This study addresses this limitation by proposing a volatility-aware reconstruction method that transforms fragmented transaction records into continuous apartment-level time-series. Transaction data are reorganized into monthly sequences, and unobserved intervals are reconstructed by combining local temporal continuity with region-level price dynamics. The proposed approach enables the reconstructed series to better reflect underlying market behavior while preserving temporal consistency.

The main contributions of this study are as follows. First, we propose a data-centric approach that transforms event-driven transaction data into continuous sequences suitable for time-series forecasting. Second, we introduce a volatility-aware reconstruction method that integrates regional price dynamics to restore unobserved intervals. Third, we demonstrate that improved temporal consistency leads to performance gains across models and regions.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 presents the proposed methodology. Section 4 describes the experimental design and results. Section 5 concludes the paper.

2. Related Work

Real estate price forecasting has evolved from statistical models to machine learning and deep learning approaches, with increasing emphasis on temporal dynamics, spatial relationships, and the integration of diverse data sources.

2.1 Global Advances in Real Estate Price Forecasting

Early studies focus on statistical approaches such as hedonic price models and multiple regression analysis, which explain housing prices based on structural and locational attributes [3], [4]. While these models provide interpretability, they have limited ability to capture nonlinear relationships. Time-series models such as ARIMA are used to capture temporal dependencies in aggregated data [5].

To address nonlinear patterns in the real estate market, machine learning methods including Random Forest, boosting, and support vector machines are widely applied and outperform traditional regression-based approaches [6]. As real estate prices are increasingly interpreted as time-evolving processes, deep learning models such as RNN, LSTM, and GRU have emerged as effective tools for capturing sequential dependencies and long-term temporal patterns [7]. Hybrid approaches combining statistical and deep learning models have also been explored to capture both linear and nonlinear structures [8].

More recent studies integrate diverse data sources and adopt deep learning-based sequence models to capture complex temporal patterns. For example, Chiu applied an LSTM model to forecast housing prices using housing price index data and related variables, demonstrating the effectiveness of deep learning in capturing temporal patterns [9]. Kishor also examined house price forecasting using macroeconomic fundamentals, credit conditions, and supply indicators, highlighting the importance of financial and supply-side factors [10].

In particular, recent work incorporates spatial relationships alongside temporal modeling. Graph-based approaches capture interactions between housing units or regions, while external features such as transportation accessibility and socioeconomic indicators are used to explain price variation. Ge proposes a combined LSTM and Graph CNN framework to model both temporal trends and spatial dependence [11], and Moghimi et al. developed a graph-based model to address spatial and temporal irregularities in real estate data [12]. These studies emphasize the importance of jointly modeling temporal and spatial dependencies.

Recent studies address incomplete observations and data sparsity from a spatial perspective, such as spatial interpolation, geostatistical modeling, and sample expansion. Kim et al. explored machine learning and spatial interpolation methods to estimate house prices in locations without transaction records [13]. Sellam et al. proposed a multi-head gated attention model for spatial interpolation, while Cellmer and Kobylińska combined machine learning with geostatistical methods to incorporate spatial effects in housing price prediction [14], [15]. Zhang et al. further addressed sparsity in housing price index construction by expanding usable samples through spatial relationships between housing units [16].

While these approaches improve estimation in sparse settings, they focus on spatial relationships rather than reconstructing irregular transaction records into continuous time-series. Specifically, limited attention has been given to reconstructing irregular transaction data into continuous apartment-level time-series by jointly modeling temporal continuity and regional market dynamics. To address this gap, this study proposes a volatility-aware reconstruction method that transforms sparse transaction records into continuous apartment-level time-series.

2.2 Evolution of Real Estate Price Forecasting in Korea

In Korea, real estate forecasting studies have developed under a transaction-based data environment shaped by the real estate transaction price disclosure system. This system provides detailed records of individual property transactions. However, observations occur only when transactions take place, resulting in inherently irregular and sparse time-series at the apartment level.

Due to this structure, early studies have relied on aggregated price indices rather than raw transaction data [17]. By aggregating transaction records at the regional level, these approaches transform irregular observations into regularly structured time-series suitable for conventional models. Unlike many global studies that rely on aggregated indices, Korean studies frequently utilize transaction-level data, which leads to irregular and sparse observation patterns.

Subsequent studies incorporate macroeconomic and property-level variables to improve predictive performance. Bae and Yu integrated macroeconomic indicators such as interest rates and price indices with apartment-level features [18]. Spatial and regional factors are widely considered, reflecting the influence of socioeconomic conditions, infrastructure, and accessibility on housing prices [19]. To support model training, transaction data are often reorganized into structured formats, such as monthly aggregated time-series at the regional level [20]. Recent studies apply deep learning models with regional information, showing that prediction performance is sensitive to spatial unit definitions and local contextual features. Other studies incorporate spatial context using surrounding facility information, demonstrating that regional characteristics improve prediction accuracy [21].

Despite these efforts, most Korean studies rely on aggregated regional time-series, spatial estimation, or structured inputs. As a result, transaction data remain sparse and discontinuous at the individual apartment level, limiting the ability to capture continuous price dynamics over time. Motivated by this limitation, this study reconstructs transaction-driven data into continuous apartment-level time-series by explicitly modeling event-driven sparsity and incorporating regional price dynamics. This approach differs from prior methods that primarily rely on spatial estimation or aggregated representations, enabling more realistic modeling of apartment-level price trajectories.

3. Methodology

This section presents a framework for constructing apartment-level time-series from irregular, event-driven transaction records. The objective is to transform fragmented observations into structured representations suitable for predictive modeling. Unlike conventional approaches that treat missing values as isolated issues, the proposed method focuses on reconstructing temporal continuity.

Figure 1 illustrates the overall process, in which raw transaction records are transformed into structured time-series representations. The proposed framework consists of three stages: data preprocessing, volatility-aware reconstruction, and model development. Each stage addresses key limitations of transaction data, including heterogeneity across sources, irregular observation intervals, and unobserved periods.

Fig. 1. Overview of the proposed data construction approach for apartment-level time-series forecasting

../../Resources/kiee/KIEE.2026.75.6.1390/fig1.png

3.1 Data Description

The dataset used in this study is obtained from a publicly available Kaggle dataset constructed from real estate transaction records collected via a Korean public API [22]. The data include apartment-level transaction information such as transaction prices and transaction dates. Samples corresponding to three metropolitan cities, namely Seoul, Busan, and Daegu, are used for analysis. The study period spans from January 2015 to April 2023, covering a total of 100 months. The dataset is organized as a monthly panel, where each apartment unit forms a fixed-length time-series.

Figure 2 shows the distribution of observed months per apartment unit across the three regions before applying the minimum-observation filtering criterion. The distribution is highly concentrated in low-observation intervals, indicating that many apartment units have only a small number of observed transaction months. This pattern reflects the event-driven sparsity and irregular temporal structure of the original transaction data.

Fig. 2. Distribution of observed transaction months per apartment unit in the original dataset

../../Resources/kiee/KIEE.2026.75.6.1390/fig2.png

To construct a reliable experimental dataset for sequence-based forecasting, apartment units with fewer than 50 observed months were excluded. After applying this filtering criterion, the final dataset consists of 1,486 apartment units and 148,600 monthly observations, including both observed and missing entries, of which 90,966 correspond to actual transaction records.

At the regional level, the dataset includes 546 apartment units and 33,124 transactions in Seoul, 441 units and 27,123 transactions in Busan, and 499 units and 30,719 transactions in Daegu. Due to the event-driven nature of real estate transactions, observations are recorded only when transactions occur, resulting in substantial sparsity at the apartment level. On average, each apartment unit has approximately 61 observed months out of the total 100-month period, corresponding to an observation density of 0.61 and a missing rate of 0.39.

3.2 Data Preprocessing

Locational and structural factors are essential determinants of housing prices in the Korean market. Variables reflecting transportation accessibility, educational infrastructure, and building characteristics are therefore included to capture both regional context and property-specific attributes. Transaction records and external data are integrated into a unified structure. Because data sources differ in spatial identifiers, temporal formats, and internal structures, a consistent analytical schema is defined.

Temporal and spatial standardization is performed to ensure consistent agreement on transaction dates and regions. Transaction dates are converted into monthly timestamps, and regional identifiers are unified to ensure consistent matching. External data are transformed into the same spatial and temporal units. School data are aggregated at the regional level, and subway accessibility data are reshaped into a monthly format. These variables are then merged with transaction records using region and time as common keys, resulting in a dataset that integrates transaction information with locational context.

Finally, all variables are organized into a consistent data structure. The dataset is sorted by apartment attributes and time in preparation for sequence construction. Although this process resolves inconsistencies across sources, transaction records remain sparse, as observations exist only when transactions occur. Therefore, an additional reconstruction step is required to generate continuous time-series.

3.3 Volatility-Aware Data Reconstruction

This stage transforms irregular transaction records into continuous apartment-level time-series. Since transactions occur only upon sale, reconstructing transaction records to handle unobserved intervals is a prerequisite for applying sequence-based models. The Apartment-Unit Identifier (AUID) is a unique code that defines the smallest unit of each apartment and is assigned based on location, apartment complex, and exclusive area. Transactions within the same month are aggregated into a single observation, and monthly sequences are constructed over the full study period.

This process converts irregular transaction records into aligned apartment-level panels, where unobserved months are explicitly represented. Each apartment sequence is linked to an administrative regional identifier to preserve spatial context. In this study, experiments are conducted separately for each metropolitan city (Seoul, Busan, and Daegu), with regional information defined at the district level (Si-Gun-Gu), representing the administrative subdivision within each city. Accordingly, each apartment is associated with a corresponding district within its city.

To ensure data reliability, apartments with extremely sparse observations are excluded. Although this process organizes irregular transaction records into structured sequences, it does not resolve the absence of price observations in months without transactions. Therefore, an additional reconstruction method is required to estimate these unobserved values and recover continuous price trajectories.

The reconstruction integrates two complementary components: a local estimate and a regional estimate, to recover continuous price trajectories. The local estimate captures price changes based on the transaction history of an individual apartment, while the regional estimate reflects overall market trends shared across apartments within the same region. By combining these two components, the method aims to estimate realistic price movements during months without transactions.

The reconstructed price is defined as

(1)
$\hat{P} = w_{lin} \cdot P_{lin} + w_{reg} \cdot P_{reg}$

where Plin denotes the local estimate obtained from linear interpolation and Preg denotes the regional estimate derived from market-level dynamics. The weights wlin and wreg control the relative influence of the local and regional estimates. The weights are designed to balance local temporal continuity and regional dynamics. This design is motivated by the complementary properties of the two estimates: local estimates become more reliable as the number of observed transactions increases, whereas regional trends provide more stable estimates under sparse observation conditions by leveraging aggregated market information. To capture overall market behavior, the regional price trend is first constructed. At each time step, the regional average price is calculated at the district level (Si-Gun-Gu), where all observed apartment transaction prices within the same district and month are aggregated:

(2)
$\overline{P}_t = \frac{1}{N_t} \sum_k P_{k,t}$

where Pk,t represents the transaction price of apartment k at time t, and Nt is the number of apartments with observed transactions at that time. This value represents the average housing price within a district at a given month.

To reduce short-term fluctuations and noise, the resulting series is smoothed using a 3-month moving average, yielding a stable regional price trend denoted as St. The choice of the 3-month window is further validated through sensitivity analysis presented in Section 4.1.

Based on this smoothed trend, a regional volatility factor is computed as

(3)
$M_t = \frac{S_t}{S_{t-1}}$

This factor represents the relative change in the regional market compared to the previous time step.

The regional estimate is then obtained by applying this factor to the previous observed price:

(4)
$P_{reg} = P_{t-1} \cdot M_t$

This indicates that when an apartment has no transaction in a given month, its price is updated according to the overall market movement of the region.

The local estimate is obtained using linear interpolation based on the apartment’s own transaction history. This method connects observed prices across time, ensuring smooth transitions between known values.

To balance these two estimations, their fidelity is adjusted based on the observation density for each apartment. The weight assigned to the local estimate increases linearly with the data fidelity of each apartment, while the regional weight is defined as the complementary portion so that the two weights sum to one. Apartments with many observed transactions provide reliable information for interpolation, so the local estimate is given greater importance. In contrast, apartments with few observations rely more on regional trends, since their individual price history is less informative. This adaptive weighting ensures stable and realistic reconstruction across varying data conditions.

The final price is obtained as a weighted combination of the local and regional estimates. Reconstruction is applied only to months without transactions, while observed transaction prices remain unchanged. The proposed method is applied to transaction prices, whereas other variables are interpolated using standard linear methods.

Consistent with this reconstruction framework, this study focuses on forecasting the temporal evolution of prices for existing apartment units using reconstructed time-series data.

3.4 Forecasting Model Development

The reconstructed time-series ensure temporal continuity prior to sequence construction. Although interpolation may utilize both past and future observations, prediction targets are strictly excluded from input construction. Each input sequence contains only observations preceding the prediction time step, preventing data leakage.

The reconstructed time-series are transformed into model-ready sequences using a sliding-window approach. Sequences are generated chronologically without random shuffling. Each input consists of 12 months of observations, and the subsequent price is used as the prediction target. Specifically, a sequence from time t to t+11 is used to predict the price at time t+12, and this process is repeated across the entire time-series.

To ensure a proper time-series forecasting setting, the dataset is partitioned using a consistent temporal criterion. For each apartment unit (AUID), the full observation period (100 months) is divided chronologically into training (first 70 months), validation (next 10 months), and testing (final 20 months). This time-based split is applied consistently across all AUIDs, ensuring that training data always precede validation and test data.

Each sequence includes both apartment-specific and regional variables. Numerical features are standardized, and categorical identifiers are encoded to capture spatial heterogeneity. Machine learning models use flattened feature vectors, whereas deep learning models retain sequential structure to capture temporal dependencies. To evaluate the proposed method, three data configurations are considered: removal of incomplete sequences, linear interpolation, and the proposed volatility-aware reconstruction.

4. Experimental Results

4.1 Experimental Setup

Experiments were conducted using apartment transaction data from Seoul, Busan, and Daegu. All experiments were conducted on the final dataset described in Section 3.1. To ensure a fair comparison across data construction strategies, a common eligible AUID pool was first defined. Specifically, AUIDs with fewer than 50 observed transaction months over the 100-month study period were excluded to avoid unreliable sequence construction from extremely sparse histories. This filtering criterion was applied identically to all experimental settings. The dataset was partitioned based on a consistent time-based split, where the full observation period was divided chronologically into training, validation, and testing subsets, ensuring that training data always precede validation and test data across all AUIDs.

Table 1. Model architecture and key parameters

Model Key Parameters
XGBoost n_estimators=1000, max_depth=7
LightGBM n_estimators=1000, num_leaves=31
LSTM 1 recurrent layer, hidden_dim=128
GRU 1 recurrent layer, hidden_dim=128
Transformer d_model=128, heads=8, encoder layers=2

Table 2. Sensitivity analysis of moving average window (GRU model)

Region Window R2 MAE RMSE MAPE
Seoul 1 0.9855 6,013 7,994 5.56
3 0.9883 5,315 7,217 4.93
6 0.9878 5,535 7,380 5.13
Busan 1 0.9795 3,256 4,334 10.23
3 0.9807 3,196 4,226 10.13
6 0.9804 3,232 4,263 10.17
Daegu 1 0.9880 1,103 1,662 4.69
3 0.9884 1,076 1,639 4.58
6 0.9881 1,103 1,660 4.72

All methods were evaluated on the same set of AUIDs, ensuring that performance differences arise from the data construction strategy rather than differences in the underlying sample composition. In the transaction-driven sampling setting, incomplete sequences are excluded at the sequence level rather than removing entire apartment units. Thus, all methods are evaluated on an identical set of AUIDs, and differences arise solely from how missing observations are handled. Apartment-level time-series were then constructed, and forecasting performance was evaluated under three data reconstruction strategies: transaction-driven sampling, linear interpolation, and the proposed volatility-aware reconstruction. Categorical regional identifiers were encoded using learnable embeddings and combined with numerical features.

Table 3. Forecasting performance comparison under different data reconstruction strategies in Seoul

Strategy Model R2 MAE RMSE MAPE
Transaction-driven Sampling XGBoost 0.9225 10,819 17,930 9.28
LightGBM 0.9237 10,420 17,795 8.78
LSTM 0.9411 10,246 15,637 9.57
GRU 0.955 8,999 13,666 8.03
Transformer 0.9271 10,850 17,392 9.39
Linear Interpolation XGBoost 0.9723 7,814 11,094 7.07
LightGBM 0.9521 8,324 14,599 6.56
LSTM 0.9775 8,367 10,009 8.03
GRU 0.9874 5,687 7,488 5.27
Transformer 0.9705 8,593 11,446 8.18
Volatility-Aware Reconstruction XGBoost 0.9709 7,623 11,358 6.94
LightGBM 0.9527 7,786 14,486 6.17
LSTM 0.9754 8,789 10,453 8.41
GRU 0.9883 5,315 7,217 4.93
Transformer 0.9734 7,616 10,870 6.89

Five models were evaluated: XGBoost, LightGBM, LSTM, GRU, and Transformer. For deep learning models (LSTM, GRU, and Transformer), models were trained using the Adam optimizer with a learning rate of 0.001 and mean squared error (MSE) loss. Training was conducted for up to 100 epochs with early stopping to prevent overfitting. Table 1 summarizes the key model configurations. Performance was assessed using R², MAE, RMSE, and MAPE, with R² and MAPE as the primary metrics. All experiments were implemented in Python using PyTorch.

Table 4. Forecasting performance comparison under different data reconstruction strategies in Busan

Strategy Model R2 MAE RMSE MAPE
Transaction-driven Sampling XGBoost 0.8938 3,997 8,716 9.33
LightGBM 0.8932 4,012 8,742 9.43
LSTM 0.9321 3,975 6,970 10.66
GRU 0.9297 3,978 7,093 10.44
Transformer 0.8857 4,511 9,043 11.06
Linear Interpolation XGBoost 0.9708 2,574 5,236 6.14
LightGBM 0.9459 2,779 7,125 5.23
LSTM 0.9861 2,497 3,606 7.39
GRU 0.98 3,347 4,337 10.7
Transformer 0.9611 3,762 6,045 10.12
Volatility-Aware Reconstruction XGBoost 0.9735 2,391 4,945 5.86
LightGBM 0.9499 2,547 6,805 5.18
LSTM 0.986 2,454 3,602 7.11
GRU 0.9807 3,196 4,226 10.13
Transformer 0.9655 3,494 5,644 9.22

In addition, the sensitivity of the smoothing window used in the regional volatility factor was examined. Additional experiments were conducted using 1-, 3-, and 6-month moving averages, and the results are summarized in Table 2. As shown in Table 2, the 3-month window provided the best performance across all regions. Compared with the 1-month window, it reduced short-term volatility, whereas the 6-month window tended to smooth out recent changes excessively. These results indicate that the 3-month window offers a balanced trade-off between stability and responsiveness.

4.2 Results and Analysis

Tables 35 present the performance comparison across different data handling strategies and model architectures. MAE and RMSE values are reported in units of 10,000 KRW. Across all regions, the choice of data reconstruction strategy had a substantial impact on performance, comparable to differences between model architectures.

Table 5. Forecasting performance comparison under different data reconstruction strategies in Daegu

Strategy Model R2 MAE RMSE MAPE
Transaction-driven Sampling XGBoost 0.9443 2,055 3,280 7.95
LightGBM 0.9421 2,011 3,343 7.62
LSTM 0.9528 1,956 3,018 7.72
GRU 0.9543 1,902 2,971 7.49
Transformer 0.9419 2,086 3,349 8.07
Linear Interpolation XGBoost 0.9773 1,467 2,313 5.75
LightGBM 0.9673 1,648 2,776 6.15
LSTM 0.9841 1,247 1,933 5.1
GRU 0.9871 1,188 1,740 5.26
Transformer 0.9751 1,701 2,422 6.86
Volatility-Aware Reconstruction XGBoost 0.9797 1,369 2,170 5.35
LightGBM 0.9738 1,307 2,463 4.83
LSTM 0.985 1,179 1,863 4.77
GRU 0.9884 1,076 1,639 4.58
Transformer 0.9769 1,611 2,314 6.51

Removing incomplete observations consistently yielded the lowest performance. For example, in Seoul, the GRU model achieved an R² of 0.955 and a MAPE of 8.03%. Applying linear interpolation led to substantial improvements across all regions by restoring temporal continuity. In Seoul, the GRU model improved to an R² of 0.9874, with MAPE reduced from 8.03% to 5.27%. This confirmed that preserving consistent time intervals is essential for effective time-series modeling.

The proposed volatility-aware reconstruction further improved performance over linear interpolation. In Seoul, GRU achieved an R² of 0.9883 and reduced MAPE to 4.93%. Similar improvements were observed in other regions, such as Busan, where XGBoost reduced MAPE from 6.14% to 5.86%, and Daegu, where GRU improved from 5.26% to 4.58%. These results indicate that incorporating regional price dynamics enhances forecasting accuracy beyond local continuity.

Model-wise, sequence-based models such as GRU and LSTM achieved strong performance when applied to reconstructed time-series, while tree-based models such as XGBoost and LightGBM maintained stable performance across regions. This suggests that improvements in data structure benefit different model types in distinct ways.

Table 6. Performance comparison by observation density (GRU model)

Group Strategy R2 MAE RMSE MAPE
Low
(20–49)
Transaction-driven 0.9625 6,552 13,758 9.21
Linear Interpolation 0.9988 1,311 2,631 2.88
VA Reconstruction 0.9983 1,635 3,217 2.47
Medium
(50–79)
Transaction-driven 0.9747 4,928 9,114 8.60
Linear Interpolation 0.9945 2,847 4,433 5.53
VA Reconstruction 0.9931 3,389 4,980 6.68
High
(80+)
Transaction-driven 0.9824 5,703 10,015 10.14
Linear Interpolation 0.9842 5,612 9,381 9.49
VA Reconstruction 0.9846 5,526 9,272 9.41

These characteristics are reflected in the magnitude of performance improvement. In Seoul, reconstruction led to the largest gains; for example, under the GRU model, MAE decreased from 8,999 to 5,315. This indicates that restoring temporal continuity and incorporating regional dynamics are particularly effective in highly volatile markets. In Daegu, improvements are smaller but still consistent. Under the GRU model, MAE decreased from 1,902 to 1,076, reflecting relatively smooth price dynamics.

This suggests that reconstruction contributes to stable forecasting even in less volatile markets. Busan shows intermediate behavior, with performance improvements observed across reconstruction strategies but greater variability across models. For example, under the reconstruction method, XGBoost achieves the lowest MAE (2,391), while GRU shows relatively higher errors. This indicates that forecast performance in Busan is more sensitive to model choice and data characteristics.

Overall, the results demonstrate that the effectiveness of the reconstruction method is closely associated with regional market characteristics, with larger gains in more dynamic markets and consistent improvements across all regions.

To further analyze the effect of observation density on reconstruction performance, we conducted a breakdown analysis by grouping AUIDs according to the number of observed transaction months across all regions. The dataset was divided into three groups: low-frequency (20–49 observed months), medium-frequency (50–79), and high-frequency (80 or more). The results are summarized in Table 6. As shown in Table 6, the effectiveness of the proposed method varies across observation density levels.

In the low-frequency group, a substantial performance improvement is observed when interpolation is applied, with MAPE decreasing significantly compared to the transaction-driven sampling setting. This improvement is primarily attributed to the restoration of temporal continuity in the input sequences, as fragmented sequences limit the ability of sequence-based models to capture temporal dependencies. In contrast, interpolation reconstructs continuous time-series, enabling more effective learning of temporal patterns. In the medium-frequency group, the proposed method shows slightly lower performance than linear interpolation. This suggests that when sufficient local observations are available, linear interpolation can capture temporal patterns effectively, and the incorporation of regional dynamics may introduce additional variability. In the high-frequency group, performance differences between methods become marginal, as most observations are already available. However, the proposed method shows marginal improvements, indicating that incorporating regional trends can contribute to more stable predictions.

These results indicate that the effectiveness of the proposed approach depends on observation density, highlighting both its strengths and limitations. In particular, the method is most beneficial when the data are sparse but still contain sufficient structure, while its relative advantage decreases when local observations are already sufficient. Other factors, such as price volatility and regional scale, may also influence the effectiveness of the proposed approach, and further analysis of these factors is left for future work.

5. Conclusions

This study addresses a key structural limitation of real estate transaction data, where event-driven sparsity results in discontinuous time-series representations. To overcome this issue, we propose a volatility-aware reconstruction method that transforms transaction records into continuous apartment-level time-series. Experimental results demonstrate that ensuring temporal continuity is essential for accurate real estate price forecasting. While interpolation improves performance by restoring continuity, the proposed approach further enhances accuracy by incorporating regional market dynamics and yields consistent performance improvements across models and regions.

This study contributes to the literature by treating unobserved intervals as a structural modeling problem rather than a simple preprocessing task. By integrating apartment-level temporal continuity with region-level dynamics, the proposed approach provides a more realistic representation of housing price evolution. The results also indicate that improvements in data structure can have an impact comparable to model selection. While sequence-based models benefit from reconstructed temporal patterns, tree-based models remain competitive. This suggests that enhanced data quality consistently improves performance across model types. Consistent performance gains across Seoul, Busan, and Daegu demonstrate the robustness of the proposed approach under different market conditions, supporting its applicability to event-driven real estate data.

Despite these contributions, several limitations remain. The use of a fixed moving average window may introduce a lag in reflecting sudden market changes, such as rapid macroeconomic shocks. Although the 3-month window provides a balanced trade-off between stability and responsiveness, it may not fully capture abrupt shifts in market conditions. Region-level aggregation may not fully capture complex spatial interactions, and the set of explanatory variables remains limited. In addition, formal statistical significance tests were not conducted in this study, which may limit the strength of the performance comparison. Other factors such as price volatility and regional scale may influence performance, and their effects warrant further investigation. Finally, the proposed framework assumes an offline reconstruction setting and does not fully reflect a strictly causal real-time forecasting scenario. Future research can extend this work by incorporating more detailed spatial modeling and adaptive reconstruction mechanisms, as well as validating the approach in other markets.

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) (Grant Nos. RS-2025-16069706 and RS-2025 00516023), and the Soonchunhyang University Research Fund.

References

1 
Q. Sun, S. A. Javeed, Y. Tang, Y. Feng, "The impact of housing prices and land financing on economic growth: Evidence from Chinese 277 cities at the prefecture level and above," PLOS ONE, vol. 19, no. 4, 2024. DOI
2 
I. Min, "The impact of the disclosure of direct transaction information on housing sale prices and transaction volume," The Korea Spatial Planning Review, vol. 117, pp. 167-179, 2023. DOI
3 
S. Rosen, "Hedonic prices and implicit markets: Product differentiation in pure competition," Journal of Political Economy, vol. 82, no. 1, pp. 34-55, 1974. DOI
4 
Q. Zhang, "Housing price prediction based on multiple linear regression," Scientific Programming, vol. 2021, 2021. DOI
5 
G. W. Crawford, M. C. Fratantoni, "Assessing the forecasting performance of regime-switching, ARIMA and GARCH models of house prices," Real Estate Economics, vol. 31, no. 2, pp. 223-243, 2003. DOI
6 
E. A. Antipov, E. B. Pokryshevskaya, "Mass appraisal of residential apartments: An application of Random Forest for valuation and a CART-based approach for model diagnostics," Expert Systems with Applications, vol. 39, no. 2, pp. 1772-1778, 2012. DOI
7 
H. Kim, Y. Kwon, Y. Choi, "Assessing the impact of public rental housing on the housing prices in proximity: Based on the regional and local level of price prediction models using long short-term memory (LSTM)," Sustainability, vol. 12, no. 18, pp. 7520, 2020. DOI
8 
A. S. Temür, M. Akgün, G. Temür, "Predicting housing sales in Turkey using ARIMA, LSTM and hybrid models," Journal of Business Economics and Management, vol. 20, no. 5, pp. 920-938, 2019. DOI
9 
K. C. Chiu, "A long short-term memory model for forecasting housing prices in Taiwan in the post-epidemic era through big data analytics," Asia Pacific Management Review, vol. 29, no. 3, pp. 273-283, 2024. DOI
10 
N. K. Kishor, "Forecasting house prices: The role of fundamentals, credit conditions, and supply indicators," Journal of Real Estate Finance and Economics, vol. 70, no. 1, pp. 121-143, 2025. DOI
11 
C. Ge, "A LSTM and Graph CNN combined network for community house price forecasting," 2019. DOI
12 
F. Moghimi, R. A. Johnson, A. Krause, "Rethinking real estate pricing with Transformer graph neural networks (T-GNN)," 2023. DOI
13 
J. Kim, Y. Lee, M. H. Lee, S. Y. Hong, "A comparative study of machine learning and spatial interpolation methods for predicting house prices," Sustainability, vol. 14, no. 15, pp. 9056, 2022. DOI
14 
Z. A. Sellam, C. Distante, A. Taleb-Ahmed, P. L. Mazzeo, Article no. 125276, "Boosting house price estimations with multi-head gated attention," Expert Systems with Applications, vol. 259, 2025. DOI
15 
R. Cellmer, K. Kobylińska, "Housing price prediction - machine learning and geostatistical methods," Real Estate Management and Valuation, vol. 33, no. 1, pp. 1-10, 2025. DOI
16 
H. Y. Zhang, Z. S. Song, Z. Chen, "An approach for constructing spatially paired pseudo repeat-sales housing price indices in China," Journal of Housing and the Built Environment, vol. 40, no. 1, pp. 1-17, 2025. DOI
17 
H. Park, J. H. Kim, "A study on the estimation of Seoul apartment price index using spatiotemporal autoregression model," The Korea Spatial Planning Review, vol. 42, pp. 8, 2004. Google Search
18 
S. W. Bae, J. S. Yu, "Predicting the real estate price index using machine learning methods and time series analysis model," Housing Studies Review, vol. 26, no. 1, pp. 107-133, 2018. DOI
19 
J. Lee, J. P. Ryu, "Prediction of housing price index using artificial neural network," Journal of the Korea Academia-Industrial cooperation Society, vol. 22, no. 4, pp. 228-234, 2021. DOI
20 
J. Lee, H. Kim, G. Shim, "Prediction model of real estate transaction price with the LSTM model based on ai and bigdata," The International Journal of Advanced Culture Technology, vol. 10, no. 1, pp. 274-283, 2022. DOI
21 
E. J. Han, S. H. Chun, no. 127748, "A long short-term memory model using kernel density estimation for forecasting apartment prices in Seoul city," Expert Systems with Applications, vol. 283, 2025. DOI
22 
brainer3220, Available: https://www.kaggle.com/datasets/brainer3220/korean-real-estate-transaction-data, "Korean Apartment Deal Data, Kaggle," 2023. DOI

저자소개

김민중 (Min-Joong Kim)
../../Resources/kiee/KIEE.2026.75.6.1390/au1.png

Minjoong Kim received his B.S. degree in AI Big Data Engineering from Soonchunhyang University, Asan, South Korea, in 2024. Currently, he is pursuing a master’s degree in the Department of ICT Convergence at Soonchunhyang University, Asan, South Korea. His research interests include real estate price forecasting, time-series data analysis, and deep learning.

김현우 (Hyeon-Woo Kim)
../../Resources/kiee/KIEE.2026.75.6.1390/au2.png

Hyeonwoo Kim received his B.S. degree in Information and Communication Engineering from Hansung University, Seoul, Korea, in 2017, and his Ph.D. degree in Electrical Engineering from Korea University, Seoul, Korea, in 2024. He is currently a faculty member in the Department of Computer Science and Engineering at Soonchunhyang University, Asan, South Korea. His research interests include computer vision, deep learning, face image analysis, and generative models.