461
Views
16
CrossRef citations to date
0
Altmetric
Original Articles

Significance tests or confidence intervals: which are preferable for the comparison of classifiers?

&
Pages 189-206 | Received 22 Jul 2011, Accepted 26 Feb 2012, Published online: 26 Apr 2012

References

  • Berger , JO and Berry , DA . 1988 . Statistical Analysis and the Illusion of Objectivity . American Scientist , 76 : 159 – 165 .
  • Berrar , D , Bradbury , I and Dubitzky , W . 2006 . Avoiding Model Selection Bias in Small-sample Genomic Data Sets . Bioinformatics , 22 : 1245 – 1250 .
  • Bouckaert , RR and Frank , E . 2004 . Evaluating the ReplicabilIty of Significance Tests for Comparing Learning Algorithms . Advances in Knowledge Discovery and Data Mining , 3056 : 3 – 12 .
  • Breiman , L . 2001 . Random Forests . Machine Learning , 45 : 5 – 32 .
  • Cawley , GC and Talbot , NLC . 2010 . On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation . Journal of Machine Learning Research , 11 : 2079 – 2107 .
  • Cummings , G . 2012 . Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-analysis , New York/London : Routledge, Taylor & Francis Group .
  • Demšar , J . 2006 . Statistical Comparisons of Classifiers Over Multiple Data Sets . Journal of Machine Learning Research , 7 : 1 – 30 .
  • Demšar , J . 2008 . On the Appropriateness of Statistical Tests in Machine Learning . in Proceedings of ICML 2008 Workshop on Evaluation Methods for Machine Learning II , Helsinki, Finland, 5–9 July 2008
  • Denis, D. (2003), ‘Alternatives to Null Hypothesis Significance Testing’, Theory & Science, 4(1). Available online at http://theoryandscience.icaap.org/content/vol4.1/02_denis.html. Accessed 19 April 2012
  • Dietterich , TG . 1998 . Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms . Neural Computation , 10 : 31 – 36 .
  • Dixon , P . 1998 . Why Scientists Value p Values . Psychonomic Bulletin & Review , 5 : 390 – 396 .
  • Drummond , C . (2006), ‘Machine Learning as an Experimental Science, Revisited’, in Proceedings of the 21st National Conference on Artificial Intelligence: Workshop on Evaluation Methods for Machine Learning, AAAI Press Technical Report WS-06-06, pp. 1–5
  • Drummond , C . 2008 . Finding a Balance Between Anarchy and Orthodoxy . in Proceedings of ICML 2008 Workshop on Evaluation Methods for Machine Learning II , Helsinki, Finland, 5–9 July 2008
  • Drummond , C and Japkowicz , N . 2010 . Warning: Statistical Benchmarking is Addictive. Kicking the Habit in Machine Learning . Journal of Experimental and Theoretical Artificial Intelligence , 2 : 67 – 80 .
  • Dugas , C and Gadoury , D . 2010 . Pointwise Exact Bootstrap Distributions of ROC Curves . Machine Learning , 78 : 103 – 136 .
  • Fraley , RC and Marks , MJ . 2007 . “ The Null Hypothesis Significance Testing Debate and its Implications for Personality Research ” . In Handbook of Research Methods in Personality Psychology , Edited by: Robins , RW , Fraley , RC and Krueger , RF . 149 – 169 . New York : Guilford .
  • Frank , A . and Asuncion, A. (2010), ‘UcI Machine Learning Repository’, URL http://archive.ics.uci.edu/ml
  • Garcia , S and Herrera , F . 2008 . An Extension on Statistical Comparisons of Classifiers Over Multiple Data Sets for All Pairwise Comparisons . Journal of Machine Learning Research , 9 : 2677 – 2694 .
  • Goodman , S . 1993 . P Values, Hypothesis Tests, and Likelihood: Implications for Epidemiology of a Neglected Historical Debate . American Journal of Epidemiology , 137 : 485 – 496 .
  • Goodman , S . 1999 . Toward Evidence-based Medical Statistics. 1: The p Value Fallacy . Annals of Internal Medicine , 130 : 995 – 1004 .
  • Goodman , S . 2008 . A Dirty Dozen: Twelve p-Value Misconceptions . Seminars in Hematology , 45 : 135 – 140 .
  • Hand , D . 2006 . Classifier Technology and the Illusion of Progres . Statistical Science , 21 : 1 – 14 .
  • Hastie , T , Tibshirani , R and Friedman , J . 2008 . The Elements of Statistical Learning, , 2nd , New York, Berlin, Heidelberg : Springer .
  • Holland , B . 1991 . On the Application of Three Modified Bonferroni Procedures to Pairwise Multiple Comparisons in Balanced Repeated Measures Designs . Computational Statistics Quarterly , 6 : 219 – 231 .
  • Holm , S . 1979 . A Simple Sequentially Rejective Multiple Test Procedure . Scandinavian Journal of Statistics , 6 : 65 – 70 .
  • International Committee of Medical Journal Editors (1997), ‘Uniform Requirements for Manuscripts Submitted to Biomedical Journals’, New England Journal of Medicine, 336, 309–315
  • Johnson , DH . 1999 . The Insignificance of Statistical Significance Testing . Journal of Wildlife Management , 63 : 763 – 772 .
  • Kibler , D . and Langley, P. (1988), ‘Machine Learning as an Experimental Science’, in Proceedings of the 7th International Conference on Machine Learning, pp. 1207–1211
  • Langley , P . 2011 . Machine Learning As an Experimental Science . Machine Learning , 82 : 275 – 279 .
  • Leslie , C . 2008 . “ Exhaustive Conditional Inference: Improving the Evidential Value of a Statistical Test by Identifying the Most Relevant p-value and Error Probabilities ” . In PhD Thesis , Australia : University of Melbourne .
  • Levin , JR and Robinson , DH . 1999 . Further Reflections on Hypothesis Testing and Editorial Policy for Primary Research Journals . Educational Psychology Review , 11 ( 2 ) : 143 – 155 .
  • Manly , KF , Nettleton , D and Hwang , JT . 2004 . Genomics, Prior Probability, and Statistical Tests of Multiple Hypotheses . Genome Research , 14 : 997 – 1001 .
  • May , WL and Johnson , WD . 1997 . Confidence Intervals for Differences in Correlated Binary Proportions . Statistics in Medicine , 16 : 2127 – 2136 .
  • McNemar , Q . 1947 . Note on the Sampling Error of the Difference Between Correlated Proportions or Percentages . Psychometrika , 12 : 153 – 157 .
  • Mulaik , SA , Raju , NS and Harshman , RA . 2007 . “ There Is a Time and a Place for Significance Testing ” . In What If There Were No Significance Tests? , Edited by: Harlow , LL , Mulaik , SA and Steiger , JH . 65 – 115 . New Jersey (USA) : Lawrence Erlbaum Associates .
  • Nadeau , C and Bengio , Y . 2003 . Inference for the Generalization Error . Machine Learning , 52 : 239 – 281 .
  • Nix , TW and Barnette , JJ . 1998 . The Data Analysis Dilemma: Ban or Abandon. A Review of Null Hypothesis Significance Testing . Research in the Schools , 5 : 3 – 14 .
  • Ojala , M and Garriga , GC . 2010 . Permutation Tests for Studying Classifier Performance . Journal of Machine Learning Research , 11 : 1833 – 1863 .
  • Poole , C . 2001 . Low p-values or Narrow Confidence Intervals: Which Are More Durable? . Epidemiology , 12 : 291 – 294 .
  • Quesenberry , CP and Hurst , DC . 1964 . Large Sample Simultaneous Confidence Intervals for Multinomial Proportions . Technometrics , 6 : 191 – 195 .
  • Team , R Development Core . (2009), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing
  • Robinson , GK . 1978 . On the Necessity of Bayesian Inference and the Construction of Measures of Nearness to Bayesian Form . Biometrika , 65 : 49 – 52 .
  • Rothman , J . 1978 . A Show of Confidence . New England Journal of Medicine , 299 : 1362 – 1363 .
  • Schmidt , FL . 1996 . Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers . Psychological Methods , 1 : 115 – 129 .
  • Sheskin , DJ . 2007 . Handbook of Parametric and Nonparametric Statistical Procedures , New York : Chapman and Hall, CRC .
  • Sotiriou , C , Wirapati , P , Loi , S , Harris , A , Fox , S , Smeds , J , Nordgren , H , Farmer , P , Praz , V , HaibeKains , B , Desmedt , C , Larsimont , D , Cardoso , F , Peterse , H , Nuyten , D , Buyse , M , van de Vijver , MJ , Bergh , J , Piccart , M and Delorenzi , M . 2006 . Gene Expression Profiling in Breast Cancer: UnderstandIng the Molecular Basis of HistoLogic Grade to Improve Prognosis . Journal of the National Cancer Institute , 98 : 262 – 272 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.