Mobile QR Code
Title A Study on the Real-time Failure Prediction Framework based on Machine Learning to Ensure Availability of Computing Resources
Authors 최승호(Seungho Choi) ; 서형준(Hyungjun Seo) ; 노재춘(Jaechun No) ; 박성순(Sungsoon Park)
DOI https://doi.org/10.5573/ieie.2019.56.4.63
Page pp.63-76
ISSN 2287-5026
Keywords Online failure prediction ; System monitoring ; Automated machine learning
Abstract The reliability and availability of server and storage systems became important as a number of services were prevalent on them. Therefore, predicting system failures in advance has become a major challenge. In order to ensure the availability of complex systems, several studies have been conducted to predict the critical system component faults which result in the most costly costs, and to achieve a high predictive model by applying optimal data processing and predictive algorithms based on system knowledges. However, in order to obtain such an optimal model, repeated data analysis/processing and predictive model optimization/comparison are necessary while relying on empirical knowledges and only part of them are applied. This requires a lot of time and effort to achieve an optimized predictive model. In this paper, we propose a strategy that automates the process of obtaining an optimized predictive model with the minimum cost. In our method, monitoring systems that collect important data proven from existing studies and automated machine learning have been applied to automate feature engineering, algorithm selection, and model optimization and to enable real-time failure prediction. In addition, the concepts and methods of failure prediction fragmented in various papers are systematically organized for the design of our framework.