Hard Drive Failure Prediction : A Rule Based Approach
The ability to accurately predict an impending hard disk failure is important for reliable storage system design. The facility provided by most hard drive manufacturers, called S.M.A.R.T. (self-monitoring, analysis and reporting technology), has been shown by current research to have poor predictive value. The problem of finding alternatives to S.M.A.R.T. for predicting disk failure is an area of active research. In this work, we present a rule discovery methodology, and show that it is possible to construct decision support systems that can detect such failures using information recorded from live disks. It is desired that any such prediction methodology should have high accuracy and must have ease of interpretability. Black box models can deliver highly accurate solutions but do not provide an understanding of events which explains the decision given by it. To this end we explore rule based classifiers for predicting hard disk failures from various disk events. We show that it is possible to learn easy to understand rules from disk events. Our evaluation shows that our system can be tuned either to have a high failure detection rate (i.e., classify a bad disk as bad) or to have a low false alarm rate (i.e., not classify a good disk as bad). We also propose a modification of MLRules algorithm for classification of data with imbalanced class distributions. The existing algorithm, assuming relatively balanced class distributions and equal misclassfication costs, performs poorly in classification of such datasets. The performance can be considerably improved by introducing cost- sensitive learning to the existing framework.