965
Views
1
CrossRef citations to date
0
Altmetric
Articles

Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data

&
Pages 1-22 | Published online: 15 Mar 2017

References

  • Alcalá, J., A. Fernández, J. Luengo, J. Derrac, S. Garcia, L. Sánchez, and F. Herrera. 2010. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17 (2–3):255–287.
  • Alzola, C., and F. E. Harrell. 1999. An introduction to s-plus and the hmisc and design libraries, Citeseer.
  • Bache, K., and M. Lichman. 2013. UCI machine learning repository. http://archive.ics.uci.edu/ml.
  • Baraldi, A. N., and C. K. Enders. 2010. An introduction to modern missing data analyses. Journal of School Psychology 480 (1):5–37. doi:10.1016/j.jsp.2009.10.001.
  • Batista, G. E. A. P. A., and M. C. Monard. 2003. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 170 (5–6):519–533. doi:10.1080/713827181.
  • Breiman, L. 2001. Random forests. Machine Learning 450 (1):5–32. doi:10.1023/A:1010933404324.
  • Casella, G., and E. I. George. 1992. Explaining the gibbs sampler. The American Statistician 460 (3):167–174.
  • Ding, Y., and J. S. Simonoff. 2010. An investigation of missing data methods for classification trees applied to binary response data. The Journal of Machine Learning Research 11:131–170.
  • Farhangfar, A., L. Kurgan, and D. Jennifer. 2008. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 410 (12):3692–3705. ISSN 00313203. doi:10.1016/j.patcog.2008.05.019.
  • Farhangfar, A., L. A. Kurgan, and W. Pedrycz. 2007. A novel framework for imputation of missing values in databases. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions 370 (5):692–709. doi:10.1109/TSMCA.2007.902631.
  • Friedman, N., D. Geiger, and M. Goldszmidt. 1997. Bayesian network classifiers. Machine Learning 290 (2–3):131–163. doi:10.1023/A:1007465528199.
  • Garcia-Laencina, P. J., J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal. 2010. Pattern classification with missing data: A review. Neural Computing and Applications 190 (2):263–282. doi:10.1007/s00521-009-0295-6.
  • Gheyas, I. A., and L. S. Smith. 2010. A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 730 (16):3039–3065. doi:10.1016/j.neucom.2010.06.021.
  • Honaker, J., G. King, M. Blackwell, et al. 2011. Amelia ii: A program for missing data. Journal of Statistical Software 450 (7):1–47.
  • Hornik, K., C. Buchta, and A. Zeileis. 2009. Open-source machine learning: R meets Weka. Computational Statistics 240 (2):225–232. doi:10.1007/s00180-008-0119-7.
  • Kim-Hung, L. 1988. Imputation using markov chains. Journal of Statistical Computation and Simulation 300 (1):57–79.
  • Little, R. J. A., and D. B. Rubin. 1987. Statistical analysis with missing data. In Wiley series in probability and statistics – Applied probability and statistics section series. New York: John Wiley & Sons, Inc. ISBN 9780471802549. http://books.google.ca/books?id=w40QAQAAIAAJ.
  • Liu, P., and L. Lei. 2006. Missing data treatment methods and nbi model. In Intelligent Systems Design and Applications, 2006. ISDA’06. Sixth International Conference on, vol. 1, 633–638. IEEE.
  • Luengo, J., S. Garca, and F. Herrera. 2010. A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: The good synergy between rbfns and eventcovering method. Neural Networks 230 (3):406–418. doi:10.1016/j.neunet.2009.11.014.
  • Luengo, J., S. Garca, and F. Herrera. 2012. On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems 320 (1):77–108. doi:10.1007/s10115-011-0424-2.
  • Matsubara, E. T., R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard. 2008. Missing value imputation using a semi-supervised rank aggregation approach. In Advances in artificial intelligence-SBIA, Brazilian Symposium on Artificial Intelligence, vol. 2008, 217–226. Berlin Heidelberg: Springer.
  • Robitzsch, A., T. Kiefer, A. C. George, A. Uenlue, and M. A. Robitzsch. 2012. Package CDM. http://cran.r-project.org/web/packages/CDM/index.html.
  • Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys (Wiley Series in Probability and Statistics), New York: John Wiley & Sons, Inc.
  • Rubin, D. B., and J. L. Schafer. 1990. Efficiently creating multiple imputations for incomplete multivariate normal data. In Proceedings of the Statistical Computing Section of the American Statistical Association, vol. 83, 88.
  • Sande, I. G. 1983. Hot-deck imputation procedures. Incomplete Data in Sample Surveys 3:334–350.
  • Song, Q., M. Shepperd, X. Chen, and J. Liu. 2008. Can k-nn imputation improve the performance of c4. 5 with small software project data sets? a comparative evaluation. Journal of Systems and Software 810 (12):2361–2370. doi:10.1016/j.jss.2008.05.008.
  • Stekhoven, D. J., and P. Buhlmann. 2012. Missforest-non-parametric missing value imputation for mixed-type data. Bioinformatics 280 (1):112–118. doi:10.1093/bioinformatics/btr597.
  • Templ, M., A. Alfons, A. Kowarik, and B. Prantner. 2011. Vim: Visualization and imputation of missing values. R Package Version 20 (3).
  • Twala, B. 2009. An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence 230 (5):373–405. doi:10.1080/08839510902872223.
  • Van Buuren, S., and K. Oudshoorn. 1999. Flexible multivariate imputation by mice. Leiden, The Netherlands: TNO Prevention Center.
  • Vapnik, V. N. 1996. Book review: The nature of statistical learning theory. Technometrics 380 (4):400–400. ISSN 00401706.
  • Vomlel, J. 2002. Evidence propagation in Bayesian networks for computerized adaptive testing, Citeseer, vol. 12.
  • Witten, I. H., and E. Frank. 2005. Data mining: Practical machine learning tools and techniques, In Morgan Kaufmann series in data management systems, 2nd ed. San Francisco, CA: Elsevier/Morgan Kaufmann.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.