Search in:

Advanced search

Applied Artificial Intelligence

An International Journal

Volume 31, 2017 - Issue 1

Submit an article Journal homepage

Free access

965

Views

CrossRef citations to date

Altmetric

Articles

Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data

Soroosh GhorbaniComputer Engineering Department, Polytechnique Montreal, Montreal, QC, CanadaCorrespondence[email protected]

Michel C. DesmaraisComputer Engineering Department, Polytechnique Montreal, Montreal, QC, Canada

Pages 1-22 | Published online: 15 Mar 2017

Cite this article
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF

References

Alcalá, J., A. Fernández, J. Luengo, J. Derrac, S. Garcia, L. Sánchez, and F. Herrera. 2010. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17 (2–3):255–287.
Web of Science ®Google Scholar
Alzola, C., and F. E. Harrell. 1999. An introduction to s-plus and the hmisc and design libraries, Citeseer.
Google Scholar
Bache, K., and M. Lichman. 2013. UCI machine learning repository. http://archive.ics.uci.edu/ml.
Google Scholar
Baraldi, A. N., and C. K. Enders. 2010. An introduction to modern missing data analyses. Journal of School Psychology 480 (1):5–37. doi:10.1016/j.jsp.2009.10.001.
Web of Science ®Google Scholar
Batista, G. E. A. P. A., and M. C. Monard. 2003. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 170 (5–6):519–533. doi:10.1080/713827181.
Web of Science ®Google Scholar
Breiman, L. 2001. Random forests. Machine Learning 450 (1):5–32. doi:10.1023/A:1010933404324.
Web of Science ®Google Scholar
Casella, G., and E. I. George. 1992. Explaining the gibbs sampler. The American Statistician 460 (3):167–174.
Google Scholar
Ding, Y., and J. S. Simonoff. 2010. An investigation of missing data methods for classification trees applied to binary response data. The Journal of Machine Learning Research 11:131–170.
Web of Science ®Google Scholar
Farhangfar, A., L. Kurgan, and D. Jennifer. 2008. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 410 (12):3692–3705. ISSN 00313203. doi:10.1016/j.patcog.2008.05.019.
Web of Science ®Google Scholar
Farhangfar, A., L. A. Kurgan, and W. Pedrycz. 2007. A novel framework for imputation of missing values in databases. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions 370 (5):692–709. doi:10.1109/TSMCA.2007.902631.
Web of Science ®Google Scholar
Friedman, N., D. Geiger, and M. Goldszmidt. 1997. Bayesian network classifiers. Machine Learning 290 (2–3):131–163. doi:10.1023/A:1007465528199.
Web of Science ®Google Scholar
Garcia-Laencina, P. J., J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal. 2010. Pattern classification with missing data: A review. Neural Computing and Applications 190 (2):263–282. doi:10.1007/s00521-009-0295-6.
Web of Science ®Google Scholar
Gheyas, I. A., and L. S. Smith. 2010. A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 730 (16):3039–3065. doi:10.1016/j.neucom.2010.06.021.
Web of Science ®Google Scholar
Honaker, J., G. King, M. Blackwell, et al. 2011. Amelia ii: A program for missing data. Journal of Statistical Software 450 (7):1–47.
Google Scholar
Hornik, K., C. Buchta, and A. Zeileis. 2009. Open-source machine learning: R meets Weka. Computational Statistics 240 (2):225–232. doi:10.1007/s00180-008-0119-7.
Web of Science ®Google Scholar
Kim-Hung, L. 1988. Imputation using markov chains. Journal of Statistical Computation and Simulation 300 (1):57–79.
Google Scholar
Little, R. J. A., and D. B. Rubin. 1987. Statistical analysis with missing data. In Wiley series in probability and statistics – Applied probability and statistics section series. New York: John Wiley & Sons, Inc. ISBN 9780471802549. http://books.google.ca/books?id=w40QAQAAIAAJ.
Google Scholar
Liu, P., and L. Lei. 2006. Missing data treatment methods and nbi model. In Intelligent Systems Design and Applications, 2006. ISDA’06. Sixth International Conference on, vol. 1, 633–638. IEEE.
Google Scholar
Luengo, J., S. Garca, and F. Herrera. 2010. A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: The good synergy between rbfns and eventcovering method. Neural Networks 230 (3):406–418. doi:10.1016/j.neunet.2009.11.014.
Web of Science ®Google Scholar
Luengo, J., S. Garca, and F. Herrera. 2012. On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems 320 (1):77–108. doi:10.1007/s10115-011-0424-2.
Web of Science ®Google Scholar
Matsubara, E. T., R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard. 2008. Missing value imputation using a semi-supervised rank aggregation approach. In Advances in artificial intelligence-SBIA, Brazilian Symposium on Artificial Intelligence, vol. 2008, 217–226. Berlin Heidelberg: Springer.
Google Scholar
Robitzsch, A., T. Kiefer, A. C. George, A. Uenlue, and M. A. Robitzsch. 2012. Package CDM. http://cran.r-project.org/web/packages/CDM/index.html.
Google Scholar
Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys (Wiley Series in Probability and Statistics), New York: John Wiley & Sons, Inc.
Google Scholar
Rubin, D. B., and J. L. Schafer. 1990. Efficiently creating multiple imputations for incomplete multivariate normal data. In Proceedings of the Statistical Computing Section of the American Statistical Association, vol. 83, 88.
Google Scholar
Sande, I. G. 1983. Hot-deck imputation procedures. Incomplete Data in Sample Surveys 3:334–350.
Google Scholar
Song, Q., M. Shepperd, X. Chen, and J. Liu. 2008. Can k-nn imputation improve the performance of c4. 5 with small software project data sets? a comparative evaluation. Journal of Systems and Software 810 (12):2361–2370. doi:10.1016/j.jss.2008.05.008.
Web of Science ®Google Scholar
Stekhoven, D. J., and P. Buhlmann. 2012. Missforest-non-parametric missing value imputation for mixed-type data. Bioinformatics 280 (1):112–118. doi:10.1093/bioinformatics/btr597.
Web of Science ®Google Scholar
Templ, M., A. Alfons, A. Kowarik, and B. Prantner. 2011. Vim: Visualization and imputation of missing values. R Package Version 20 (3).
Google Scholar
Twala, B. 2009. An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence 230 (5):373–405. doi:10.1080/08839510902872223.
Web of Science ®Google Scholar
Van Buuren, S., and K. Oudshoorn. 1999. Flexible multivariate imputation by mice. Leiden, The Netherlands: TNO Prevention Center.
Google Scholar
Vapnik, V. N. 1996. Book review: The nature of statistical learning theory. Technometrics 380 (4):400–400. ISSN 00401706.
Google Scholar
Vomlel, J. 2002. Evidence propagation in Bayesian networks for computerized adaptive testing, Citeseer, vol. 12.
Google Scholar
Witten, I. H., and E. Frank. 2005. Data mining: Practical machine learning tools and techniques, In Morgan Kaufmann series in data management systems, 2nd ed. San Francisco, CA: Elsevier/Morgan Kaufmann.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Performance Comparison of Recent Imputation Methods for Classification Tasks over Binary Data

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date