2,287
Views
85
CrossRef citations to date
0
Altmetric
Original Articles

AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES

Pages 373-405 | Published online: 28 Apr 2009

REFERENCES

  • Arbuckle , J. L. and W. Wothke . 1999 . Amos 4.0 User's Guide . Chicago , IL : Smallwaters .
  • Batista , G. and M. C. Monard . 2003 . An analysis of four missing data treatment methods for supervised learning . Applied Artificial Intelligence 17 : 519 – 533 .
  • Becker , R. , J. Chambers , and A. Wilks . 1988 . The New S Language – A Programming Environment for Data Analysis and Graphics . Wadsworth & Brooks/Cole , Pacific Grove , CA, USA .
  • Blake , C. L. and C. J. Merz . 1998 . UCI Repository of machine learning databases . University of California, Department of Information and Computer Science , Irvine , CA . (http:/www.ics.uci.edu/mlearn/MLRepository.html) .
  • Breiman , L. 1996. Bagging predictors. Machine Learning 26(2):123–140.
  • Breiman , L. , J. Friedman , R. Olshen, and C. Stone . 1984 . Classification and Regression Trees . Wadsworth International Group. Wadsworth & Brooks/Cole Advanced Books & Software , Pacific Grove , CA, USA .
  • Cestnik , B. , I. Kononenko , and I. Bratko . 1987 . Assistant 86: A knowledge-elicitation tool for sophisticated users . In I. Bratko and N. Lavrac , eds. European Working Session on Learning – EWSL87 . Wilmslow , UK : Sigma Press .
  • Cheeseman , P. , J. Kelly , M. , Self , J. , Stutz , W. Taylor , and D. Freeman . 1988 . Bayesian classification . In Proceedings of American Association of Artificial Intelligence (AAAI) . San Mateo , CA : Morgan Kaufmann Publishers , 607 – 611 .
  • Dempster , A. P. , N. M. Laird , and D. B. Rubin . 1977 . Maximum likelihood estimation from incomplete data via the EM algorithm . Journal of the Royal Statistical Society, Series B 39 : 1 – 38 .
  • El-Emam , K. and A. Birk . 2000 . Validating the ISO/IEC 15504 measures of software development process capability . Journal of Systems and Software 51 ( 2 ): 119 – 149 .
  • Fujikawa , Y. and T. B. Ho . 2002 . Cluster-based algorithms for filling missing values . In 6th Pacific-Asia Conf. on Knowledge Discovery and Data Mining , Taiwan , 6–9 May. Lecture Notes in Artificial Intelligence 2336, 549–554 .
  • Gehrke , J. , W.-Y. Loh , and R. Ramakrishnan . 1999 . Classification and regression: Money can grow on trees . Tutorial notes of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , San Diego , CA, USA , 1 – 73 .
  • Kalousis , A. and M. Hilario . 2000 . Supervised knowledge discovery from incomplete data . In Proceedings of the 2nd International Conference on Data Mining 2000 , Cambridge , UK : WIT Press .
  • Kim , J.-O. and J. Curry . 1977 . The treatment of missing data in multivariate analysis . Sociological Methods and Research 6 : 215 – 240 .
  • Kirk , R. E. 1982 . Experimental Design , 2nd ed , Monterey , CA : Brooks Cole Publishing Company .
  • Lakshminarayan , K. , S. A. Harp , and T. Samad . 1999 . Imputation of missing data in industrial databases . Applied Intelligence 11 : 259 – 275 .
  • Little , R. J. A. and D. B. Rubin 1987 . Statistical Analysis with Missing Data . New York : Wiley .
  • Lobo , O. O. and M. Numao . 1999 . Ordered estimation of missing values . In Proc. of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Computer Science , 1574 , 274 – 278 .
  • Lobo , O. O. and M. Numao . 2000 . On the applicability of a machine learning method for estimating missing values . International Machine Learning Conference 2000 , Palo Alto , CA .
  • Loh , W.-Y. and N. Vanichsetakul . 1988 . Tree-structured classification via generalised discriminant analysis . Journal of the American Statistical Association 83 : 715 – 728 .
  • MINITAB . 2002 . MINITAB Statistical Software for Windows 9.0 . MINITAB, Inc., State College , PA 16801-3008, USA .
  • Myrtveit , I. , E. Stensrud , and U. Olsson . 2001 . Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods . IEEE Transactions on Software Engineering 27 ( 11 ): 999 – 1013 .
  • Pyle , D. 1999 . Data Preparation for Data Mining . San Francisco : Morgan Kauffman .
  • Quinlan , J. R. 1987 . Simplifying decision trees . International Journal of Man – Machine Studies 27 : 221 – 234 .
  • Quinlan , J. R. 1993 . C.4.5: Programs for Machine Learning . Los Altos , CA : Morgan Kauffman Publishers .
  • Robins , D. B. and N. Wang . 2000 . Inference for imputation estimators . Biometrika 87 : 113 – 124 .
  • Roth , P. L. 1994 . Missing data: A conceptual overview for applied psychologists . Personnel Psychology 47 : 537 – 560 .
  • Rubin , D. B. 1987 . Multiple Imputation for Nonresponse in Surveys . New York : John Wiley and Sons .
  • Rubin , D. B. 1996 . Multiple imputation after 18 + years . Journal of the American Statistical Association 91 : 473 – 489 .
  • Schafer , J. L. 1997 . Analysis of Incomplete Multivariate Data . London : Chapman and Hall .
  • Schafer , J. L. and M. K. Olsen . 1998 . Multiple imputation for multivariate missing data problems: A data analyst's perspective . Multivariate Behavioral Research 33 ( 4 ): 545 – 571 .
  • Schafer , J. L. and J. W. Graham . 2002 . Missing data: Our view of the state of the art . Psychological Methods 7 ( 2 ): 147 – 177 .
  • Sentas , P. , A. Lefteris , and I. Stamelos . 2004. Multiple logistic regression as imputation method applied on software effort prediction. In Proceedings of the 10th International Symposium on Software Metrics, Chicago, 14–16 September.
  • Shapiro , A. 1987 . Structured Induction in Expert Systems . London : Addison Wesley .
  • Song , Q. and M. Shepperd . 2004 . A short note on safest default missingness mechanism assumptions . Empirical Software Engineering 10 ( 2 ): 235 – 243 .
  • S-PLUS . 2003 . S-PLUS 6.2 for Windows . MathSoft, Inc. , Seattle , WA .
  • Tanner , M. A. and W. H. Wong . 1987 . The calculation of posterior distributions by data augmentation . Journal of the American Statistical Association 82 : 528 – 550 .
  • Therneau , T. M. and E. J. Atkinson . 1997 . An introduction to recursive partitioning using the RPART routines . Technical Report , Mayo Foundation. Department of Statistics, Stanford University , USA .
  • Twala , B. 2005 . Effective techniques for handling incomplete data using decision trees . Unpublished PhD thesis , Open University , Milton Keynes , UK .
  • Twala , B. , M. Cartwright , and M. Shepperd . 2005 . Comparison of various methods for handling incomplete data in software engineering databases . In 4th International Symposium on Empirical Software Engineering , Noosa Heads , Australia , November .
  • Twala , B. , M. C. Jones , and D. J. Hand . 2008 . Good methods for coping with missing data in decision trees . Pattern Recognition Letters 29 : 950 – 956 .
  • Venables , W. N. and B. D. Ripley . 1994 . Modern Applied Statistics with S-PLUS . New York : Springer .
  • Wu , C. F. J. 1983 . On the convergence of the EM algorithm . The Annals of Statistics 11 : 95 – 103 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.