949
Views
1
CrossRef citations to date
0
Altmetric
Articles

Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data

&
Pages 1-22 | Published online: 15 Mar 2017
 

ABSTRACT

This paper evaluates the effect on the predictive accuracy of different models of two recently proposed imputation methods, namely missForest (MF) and Multiple Imputation based on Expectation-Maximization (MIEM), along with two other imputation methods: Sequential Hot-deck and Multiple Imputation based on Logistic Regression (MILR). Their effect is assessed over the classification accuracy of four different models, namely Tree Augmented Naive Bayes (TAN) which has received little attention, Naive Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel. Experiments are conducted over fourteen binary datasets with large feature sets, and across a wide range of missing data rates (between 5 and 50%). The results from 10 fold cross-validations show that the performance of the imputation methods varies substantially between different classifiers and at different rates of missing values. The MIEM method is shown to generally give the best results for all the classifiers across all rates of missing data. While NB model does not benefit much from imputation compared to a no imputation baseline, LR and TAN are highly susceptible to gain from the imputation methods at higher rates of missing values. The results also show that MF works best with TAN, and Hot-deck degrades the predictive performance of SVM and NB models at high rates of missing values (over 30%). Detailed analysis of the imputation methods over the different datasets is reported. Implications of these findings on the choice of an imputation method are discussed.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.