Abstract
This article develops an effective procedure for handling two-class classification problems with highly imbalanced class sizes. In many imbalanced two-class problems, the majority class represents “normal” cases, while the minority class represents “abnormal” cases, detection of which is critical to decision making. When the class sizes are highly imbalanced, conventional classification methods tend to strongly favor the majority class, resulting in very low or even no detection of the minority class. The research objective of this article is to devise a systematic procedure to substantially improve the power of detecting the minority class so that the resulting procedure can help screen the original data set and select a much smaller subset for further investigation. A procedure is developed that is based on ensemble classifiers, where each classifier is constructed from a resized training set with reduced dimension space. In addition, how to find the best values of the decision variables in the proposed classification procedure is specified. The proposed method is compared to a set of off-the-shelf classification methods using two real data sets. The prediction results of the proposed method show remarkable improvements over the other methods. The proposed method can detect about 75% of the minority class units, while the other methods turn out much lower detection rates.
Notes
a The warranty data set consists of two data sets from two lots. The records from the two lots are analyzed separately.
a Values in each cell are the detection power and the false alarm rate, respectively.
b The cases with quadratic and radial basis kernel functions are omitted to save space.
c The out-of-bag estimates corresponding to the prediction model chosen by the optimization procedure.