705
Views
6
CrossRef citations to date
0
Altmetric
Articles

How sure can we be that a student really failed? On the measurement precision of individual pass-fail decisions from the perspective of Item Response Theory

& ORCID Icon

References

  • Adams RJ. 2005. Reliability as a measurement design effect. Stud Educ Evaluat. 31(2–3):162–172.
  • Baker FB, Kim S-H. 2010. Item response theory: parameter estimation techniques (2nd ed., Statistics, Vol. 176). New York (NY): Dekker.
  • Brannick MT, Erol-Korkmaz HT, Prewett M. 2011. A systematic review of the reliability of objective structured clinical examination scores. Med Educ. 45(12):1181–1189.
  • Cate O. ten, Snell L, Carraccio C. 2010. Medical competence: the interplay between individual ability and the health care environment. Med Teach. 32(8):669–675.
  • Champlain A. de, Gessaroli ME. 1998. Assessing the dimensionality of item response matrices with small sample sizes and short test lengths. Appl Meas Educ. 11(3):231–253.
  • Champlain A. F d. 2010. A primer on classical test theory and item response theory for assessments in medical education. Med Educ. 44:109–117.
  • Cizek GJ. 2012. Setting performance standards: foundations, methods, and innovations, Gregory J. Cizek, editor. 2nd ed. New York: Routledge.
  • DeMars C. 2010. Item response theory (Series in understanding statistics. Measurement). New York: Oxford University Press.
  • Downing SM. 2003. Item response theory: applications of modern test theory in medical education. Med Educ. 37(8):739–745.
  • Embretson SE, Reise S. 2000. Psychometric methods: item response theory for psychologists (Multivariate applications). Mahwah (NJ): Lawrence Erlbaum Associates, Publishers.
  • Eva KW, Hodges BD. 2012. Scylla or Charybdis? Can we navigate between objectification and judgement in assessment? Med Educ. 46(9):914–919.
  • Fischer GH, Molenaar IW. 1995. Rasch Models. New York, NY: Springer New York.
  • Gelman A. 2013. P values and statistical practice. Epidemiology. 24(1):69–72.
  • Hays R, Gupta TS, Veitch J. 2008. The practical value of the standard error of measurement in borderline pass/fail decisions. Med Educ. 42(8):810–815.
  • Hoekstra R, Morey RD, Rouder JN, Wagenmakers E-J. 2014. Robust misinterpretation of confidence intervals. Psychon Bull Rev. 21(5):1157–1164.
  • Huynh H. 1990. Computation and statistical inference for decision consistency indexes based on the rasch model. Journal of Educational and Behavioral Statistics. 15(4):353–368.
  • Jones P, Smith RW, Talley D. 2006. Developing test forms for small-scale achievement testing systems. In: S. M. Downing & T. Haladyna (Eds.), Handbook of test development. New York (NY): L. Erlbaum Associates; p. 487–525.
  • Kane M. 1996. The Precision of Measurements. Appl Meas Educ. 9(4):355–379.
  • Kiefer T, Robitzsch A, Wu M. 2017. TAM: Test Analysis Modules (1st ed.). [accessed 2020 Aug 18]. https://CRAN.R-project.org/package=TAM.
  • Lathrop QN, Cheng Y. 2014. A nonparametric approach to estimate classification accuracy and consistency. J Educ Meas. 51(3):318–334.
  • Lee W-C. 2010. Classification consistency and accuracy for complex assessments using item response theory. J Educ Meas. 47(1):1–17.
  • Lehmann EL. 1993. The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two? J Am Stat Assoc. 88(424):1242–1249.
  • Lewis C, Sheehan K. 1990. Using Bayesian decision theory to design a computerized mastery test. ETS Research Report Series. 1990(2):i-48.
  • McKinley DW, Norcini JJ. 2014. How to set standards on performance-based examinations: AMEE Guide No. 85. Med Teach. 36(2):97–110.
  • Mellenbergh GJ. 1996. Measurement precision in test score and item response models. Psychol Methods. 1(3):293–299.
  • Morey RD, Hoekstra R, Rouder JN, Lee MD, Wagenmakers E-J. 2016. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev. 23(1):103–123.
  • Neyman J. 1941. Fiducial argument and the theory of confidence intervals. Biometrika. 32(2):128–150.
  • Norcini J. 1999. Standards and reliability in evaluation: when rules of thumb don’t apply. Academic Medicine. 74(10):1088–1090.
  • Norcini J, Anderson B, Bollela V, Burch V, Costa MJ, Duvivier R, Galbraith R, Hays R, Kent A, Perrott V, et al. 2011. Criteria for good assessment: consensus statement and recommendations from the Ottawa 2010 Conference. Med Teach. 33(3):206–214.
  • Norcini J, Anderson MB, Bollela V, Burch V, Costa MJ, Duvivier R, Hays R, Palacios Mackay MF, Roberts T, Swanson D. 2018. 2018 Consensus framework for good assessment. Med Teach. 40(11):1102–1109.
  • Parkes J. 2007. Reliability as argument. Educ Meas. 26(4):2–10.
  • Pell G, Fuller R, Homer M, Roberts T. 2010. How to measure the quality of the OSCE: a review of metrics–AMEE guide no. 49. Med Teach. 32(10):802–811.
  • R Core Team. 2016. R: A language and environment for statistical computing. Vienna, Austria. [accessed 2020 Aug 18]. https://www.R-project.org/.
  • Rudner LM. 2005. Expected classification accuracy. Pract Assess Res Evaluat. 10(13):1–4.
  • Schauber SK, Hecht M, Nouns ZM. 2018. Why assessment in medical education needs a solid foundation in modern test theory. Adv Health Sci Educ. 23(1):217–232.
  • Schuwirth LWT, van der Vleuten CPM. 2011. General overview of the theories used in assessment: AMEE Guide No. 57. Med Teach. 33(10):783–797.
  • Subkoviak MJ. 1976. Estimating reliability from a single administration of a criterion-referenced test. J Educational Measurement. 13(4):265–276.
  • Swanson DB, Roberts TE. 2016. Trends in national licensing examinations in medicine. Med Educ. 50(1):101–114.
  • Tavakol M, Dennick R. 2012. Post-examination interpretation of objective test data: monitoring and improving the quality of high-stakes examinations: AMEE Guide No. 66. Med Teach. 34(3):e161–e175.
  • Tavakol M, Dennick R. 2013. Psychometric evaluation of a knowledge based examination using Rasch analysis: an illustrative guide: AMEE guide no. 72. Med Teach. 35(1):e838–e848.
  • Webb NM, Shavelson RJ, Haertel EH. 2006. Reliability coefficients and generalizability theory. In: Rao CR, Sinharay S. editors, Handbook of statistics: psychometrics (Handbook of Statistics). Amsterdam: Elsevier Science; p. 81–124.
  • Wyse AE, Hao S. 2012. An evaluation of item response theory classification accuracy and consistency indices. Appl Psychol Measure. 36(7):602–624.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.