Scale validation in applied health research: tutorial for a 6-step R-based psychometrics protocol

Alexandra L. DimaHealth Services and Performance Research (HESPER EA 7425), Univ. Lyon, Université Claude Bernard Lyon 1, Lyon, FranceCorrespondence[email protected]

http://orcid.org/0000-0002-3106-2242 View further author information

Pages 136-161 | Received 24 Sep 2017, Accepted 24 Apr 2018, Published online: 10 May 2018

Cite this article
https://doi.org/10.1080/21642850.2018.1472602
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

References

Allaire, J. J., Horner, J., Marti, V., & Porte, N. (2015). Markdown: ‘Markdown’ rendering for R. Retrieved from https://CRAN.R-project.org/package=markdown
Google Scholar
Anthoine, E., Moret, L., Regnault, A., Sébille, V., & Hardouin, J.-B. (2014). Sample size used to validate a scale: A review of publications on newly-developed patient reported outcomes measures. Health and Quality of Life Outcomes, 12. https://doi.org/10.1186/s12955-014-0176-2
PubMedGoogle Scholar
Ark, L. A. v. d. (2007). Mokken scale analysis in R. Journal of Statistical Software, 20(11), 1–19.
Web of Science ®Google Scholar
Baker, F. B., & Kim, S.-H. (2017). The basics of item response theory using R. Springer International Publishing.
Google Scholar
Bartholomew, D. J. (1998). Scaling unobservable constructs in social science. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(1), 1–13. https://doi.org/10.1111/1467-9876.00094
Google Scholar
Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). New York, NY: Routledge.
Google Scholar
Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. Cambridge, UK: Cambridge University Press.
Google Scholar
Borsboom, D. (2008). Latent variable theory. Measurement: Interdisciplinary Research and Perspectives, 6(1–2), 25–53. https://doi.org/10.1080/15366360802035497
Google Scholar
Borsboom, D. (2017). A network theory of mental disorders. World Psychiatry: Official Journal of the World Psychiatric Association (WPA), 16(1), 5–13. https://doi.org/10.1002/wps.20375
PubMed Web of Science ®Google Scholar
Borsboom, D., Rhemtulla, M., Cramer, A. O. J., van der Maas, H. L. J., Scheffer, M., & Dolan, C. V. (2016). Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological Medicine, 46(08), 1567–1579. https://doi.org/10.1017/S0033291715001944
PubMedGoogle Scholar
Broadbent, E., Petrie, K. J., Main, J., & Weinman, J. (2006). The brief illness perception questionnaire. Journal of Psychosomatic Research, 60(6), 631–637. https://doi.org/10.1016/j.jpsychores.2005.10.020
PubMed Web of Science ®Google Scholar
Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006
PubMed Web of Science ®Google Scholar
Cella, D., Gershon, R., Lai, J.-S., & Choi, S. (2007). The future of outcomes measurement: Item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research, 16(1), 133–141. https://doi.org/10.1007/s11136-007-9204-6
PubMedGoogle Scholar
Cella, D., Riley, W., Stone, A., Rothrock, N., Reeve, B., Yount, S., … PROMIS Cooperative Group. (2010). The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63(11), 1179–1194. https://doi.org/10.1016/j.jclinepi.2010.04.011
PubMed Web of Science ®Google Scholar
Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software. https://doi.org/10.18637/jss.v048.i06
Web of Science ®Google Scholar
Chan, E. H. (2014). Standards and guidelines for validation practices: Development and evaluation of measurement instruments. In Validity and validation in social, behavioral, and health sciences (pp. 9–24). New York, NY: Springer International Publishing.
Google Scholar
Chen, W.-H., Lenderking, W., Jin, Y., Wyrwich, K. W., Gelhorn, H., & Revicki, D. A. (2014). Is Rasch model analysis applicable in small sample size pilot studies for assessing item characteristics? An example using PROMIS pain behavior item bank data. Quality of Life Research, 23(2), 485–493. https://doi.org/10.1007/s11136-013-0487-5
PubMed Web of Science ®Google Scholar
Clatworthy, J., Buick, D., Hankins, M., Weinman, J., & Horne, R. (2005). The use and reporting of cluster analysis in health psychology: A review. British Journal of Health Psychology, 10(3), 329–358. https://doi.org/10.1348/135910705X25697
PubMed Web of Science ®Google Scholar
Cortina, J. M. (1993). What Is coefficient alpha?: An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104. https://doi.org/10.1037/0021-9010.78.1.98
Web of Science ®Google Scholar
Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mõttus, R., Waldorp, L. J., & Cramer, A. O. J. (2015). State of the art personality research: A tutorial on network analysis of personality data in R. Journal of Research in Personality, 54, 13–29. https://doi.org/10.1016/j.jrp.2014.07.003
Web of Science ®Google Scholar
Crutzen, R., & Peters, G.-J. Y. (2017). Scale quality: Alpha is an inadequate estimate and factor-analytic evidence is needed first of all. Health Psychology Review, 11(3), 242–247. https://doi.org/10.1080/17437199.2015.1124240
PubMed Web of Science ®Google Scholar
Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. https://doi.org/10.1111/bjop.12046
PubMed Web of Science ®Google Scholar
Epskamp, S., Borsboom, D., & Fried, E. I. (2017). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 1–18. https://doi.org/10.3758/s13428-017-0862-1
PubMed Web of Science ®Google Scholar
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. Chichester, UK: John Wiley & Sons.
Google Scholar
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 1948550617693063. https://doi.org/10.1177/1948550617693063
Web of Science ®Google Scholar
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7(3), 286–299.
Web of Science ®Google Scholar
Fok, C. C. T., & Henry, D. (2015). Increasing the sensitivity of measures to change. Prevention Science : The Official Journal of the Society for Prevention Research, 16(7), 978–986. https://doi.org/10.1007/s11121-015-0545-z
PubMed Web of Science ®Google Scholar
Friedman, C., Rubin, J., Brown, J., Buntin, M., Corn, M., Etheredge, L., … Van Houweling, D. (2015). Toward a science of learning systems: A research agenda for the high-functioning learning health system. Journal of the American Medical Informatics Association, 22(1), 43–50. https://doi.org/10.1136/amiajnl-2014-002977
PubMed Web of Science ®Google Scholar
Fries, J. F., Krishnan, E., Rose, M., Lingala, B., & Bruce, B. (2011). Improved responsiveness and reduced sample size requirements of PROMIS physical function scales with item response theory. Arthritis Research & Therapy, 13, R147. https://doi.org/10.1186/ar3461
PubMed Web of Science ®Google Scholar
Frost, M. H., Reeve, B. B., Liepa, A. M., Stauffer, J. W., & Hays, R. D. (2007). What Is sufficient evidence for the reliability and validity of patient-reported outcome measures? Value in Health, 10, S94–S105. https://doi.org/10.1111/j.1524-4733.2007.00272.x
PubMed Web of Science ®Google Scholar
Gandrud, C. (2013). Reproducible research with R and R studio. Boca Raton, FL: CRC Press.
Google Scholar
Graham, J. M. (2006). Congeneric and (essentially) Tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66(6), 930–944. https://doi.org/10.1177/0013164406288165
Web of Science ®Google Scholar
Hamilton, K., Marques, M. M., & Johnson, B. T. (2017). Advanced analytic and statistical methods in health psychology. Health Psychology Review, 11(3), 217–221. https://doi.org/10.1080/17437199.2017.1348905
PubMed Web of Science ®Google Scholar
Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item response theory and health outcomes measurement in the 21st century. Medical Care, 38(9 Suppl), II28–II42.
PubMed Web of Science ®Google Scholar
Hemker, B. T., Sijtsma, K., & Molenaar, I. W. (1995). Selection of unidimensional scales from a multidimensional item bank in the Polytomous Mokken I RT model. Applied Psychological Measurement, 19(4), 337–352. https://doi.org/10.1177/014662169501900404
Web of Science ®Google Scholar
Hobart, J., & Cano, S. (2009). Improving the evaluation of therapeutic interventions in multiple sclerosis: The role of new psychometric methods. Health Technology Assessment (Winchester, England), 13(12), iii, ix–x, 1-177. https://doi.org/10.3310/hta13120
PubMed Web of Science ®Google Scholar
Hogan, T. P., & Agnello, J. (2004). An empirical study of reporting practices concerning measurement validity. Educational and Psychological Measurement, 64(5), 802–812. https://doi.org/10.1177/0013164404264120
Web of Science ®Google Scholar
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
Web of Science ®Google Scholar
Hutcheon, J. A., Chiolero, A., & Hanley, J. A. (2010). Random measurement error and regression dilution bias. BMJ, 340, c2289. https://doi.org/10.1136/bmj.c2289
PubMed Web of Science ®Google Scholar
Jackson, D. L., Gillaspy, J. A., & Purc-Stephenson, R. (2009). Reporting practices in confirmatory factor analysis: An overview and some recommendations. Psychological Methods, 14(1), 6–23. https://doi.org/10.1037/a0014694
PubMed Web of Science ®Google Scholar
Jensen, M. P., Strom, S. E., Turner, J. A., & Romano, J. M. (1992). Validity of the sickness impact profile Roland scale as a measure of dysfunction in chronic pain patients. Pain, 50(2), 157–162.
PubMed Web of Science ®Google Scholar
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136–153. https://doi.org/10.1080/10705510701758406
Web of Science ®Google Scholar
Kelley, K., & Cheng, Y. (2012). Estimation of and confidence interval formation for reliability coefficients of homogeneous measurement instruments. Methodology, 8(2), 39–50. https://doi.org/10.1027/1614-2241/a000036
Web of Science ®Google Scholar
Leisch, F. (2002). Sweave: Dynamic generation of statistical reports using literate data analysis. In W. Härdle, & B. Rönz (Eds.), Compstat 2002 — proceedings in computational statistics (pp. 575–580). Heidelberg: Physica Verlag. Retrieved from http://www.stat.uni-muenchen.de/ leisch/Sweave
Google Scholar
Li, C.-H. (2016). Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behavior Research Methods, 48(3), 936–949. https://doi.org/10.3758/s13428-015-0619-7
PubMed Web of Science ®Google Scholar
Linacre, J. M. (1994). Sample size and item calibration or person measure stability. Rasch Measurement Transactions, 7(4), 328. https://www.rasch.org/rmt/rmt74m.htm
Google Scholar
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4, 84–99. https://doi.org/10.1037/1082-989X.4.1.84
Web of Science ®Google Scholar
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., & Hornik, K. (2017). Cluster: Cluster analysis basics and extensions.
Google Scholar
Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9). Retrieved from https://www.jstatsoft.org/article/view/v020i09
Web of Science ®Google Scholar
Marshall, M., Lockwood, A., Bradley, C., Adams, C., Joy, C., & Fenton, M. (2000). Unpublished rating scales: A major source of bias in randomised controlled trials of treatments for schizophrenia. The British Journal of Psychiatry, 176(3), 249–252. https://doi.org/10.1192/bjp.176.3.249
PubMedGoogle Scholar
McHorney, C. A., & Tarlov, A. R. (1995). Individual-patient monitoring in clinical practice: Are available health status surveys adequate? Quality of Life Research, 4(4), 293–307. https://doi.org/10.1007/BF01593882
PubMed Web of Science ®Google Scholar
Meijer, R. R., & Baneke, J. J. (2004). Analyzing psychopathology items: A case for nonparametric item response theory modeling. Psychological Methods, 9(3), 354–368. https://doi.org/10.1037/1082-989X.9.3.354
PubMed Web of Science ®Google Scholar
Meijer, R. R., Niessen, A. S. M., & Tendeiro, J. N. (2016). A practical guide to check the consistency of item response patterns in clinical research through person-Fit statistics examples and a computer program. Assessment, 23(1), 52–62. https://doi.org/10.1177/1073191115577800
PubMed Web of Science ®Google Scholar
Melzack, R. (1987). The short-form McGill pain questionnaire. Pain, 30(2), 191–197. https://doi.org/10.1016/0304-3959(87)91074-8
PubMed Web of Science ®Google Scholar
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … Vet, H. C. W. d. (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Quality of Life Research, 19(4), 539–549. https://doi.org/10.1007/s11136-010-9606-8
PubMed Web of Science ®Google Scholar
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY, US: McGraw-Hill, Inc.
Google Scholar
Peters, G.-J. Y., Dima, A. L., Plass, A. M., Crutzen, R., Gibbons, C., & Doyle, F. (2016). Measurement in health psychology: Combining theory, qualitative, and quantitative methods to do it right: Methods in health psychology symposium VI. The European Health Psychologist, 18(6), 235–246.
Google Scholar
Rabin, R., & Charro, F. d. (2001). EQ-SD: A measure of health status from the EuroQol group. Annals of Medicine, 33(5), 337–343. https://doi.org/10.3109/07853890109002087
PubMed Web of Science ®Google Scholar
R Core Team. (2013). R: A language and environment for statistical computing. Vienna: Austria. Retrieved from http://www.R-project.org
Google Scholar
Reeve, B. B., Wyrwich, K. W., Wu, A. W., Velikova, G., Terwee, C. B., Snyder, C. F., … Butt, Z. (2013). ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Quality of Life Research, 22(8), 1889–1905. https://doi.org/10.1007/s11136-012-0344-y
PubMed Web of Science ®Google Scholar
Reise, S. P., Ainsworth, A. T., & Haviland, M. G. (2005). Item response theory fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95–101. https://doi.org/10.1111/j.0963-7214.2005.00342.x
Web of Science ®Google Scholar
Revelle, W. (2017). Psych: Procedures for psychological, psychometric, and personality research. Evanston, IL: Northwestern University. Retrieved from https://CRAN.R-project.org/package=psych
Google Scholar
Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega, and the glb: Comments on sijtsma. Psychometrika, 74(1), 145. https://doi.org/10.1007/s11336-008-9102-z
Web of Science ®Google Scholar
Rizopoulos, D. (2007). Ltm: An R package for latent variable modeling and item response analysis. Journal of Statistical Software, 17(5). https://doi.org/10.18637/jss.v017.i05
Web of Science ®Google Scholar
Roland, M., & Morris, R. (1983). A study of the natural history of back pain. Part I: Development of a reliable and sensitive measure of disability in low-back pain. Spine, 8(2), 141–144.
PubMed Web of Science ®Google Scholar
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36.
Web of Science ®Google Scholar
Sawatzky, R., Chan, E. K. H., Zumbo, B. D., Ahmed, S., Bartlett, S. J., Bingham, C. O., … Lix, L. M. (2016). Modern perspectives of measurement validation emphasize justification of inferences based on patient-reported outcome scores: Seventh paper in a series on patient reported outcomes. Journal of Clinical Epidemiology. https://doi.org/10.1016/j.jclinepi.2016.12.002
PubMed Web of Science ®Google Scholar
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304–321. https://doi.org/10.1177/0734282911406653
Web of Science ®Google Scholar
Schuur, W. H. v. (2003). Mokken scale analysis: Between the Guttman scale and parametric item response theory. Political Analysis, 11(2), 139–163. https://doi.org/10.1093/pan/mpg002
Web of Science ®Google Scholar
Sijtsma, K. (2009). On the Use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107. https://doi.org/10.1007/s11336-008-9101-0
PubMed Web of Science ®Google Scholar
Sijtsma, K., & Hemker, B. T. (1998). Nonparametric polytomous IRT models for invariant item ordering, with results for parametric models. Psychometrika, 63(2), 183–200. https://doi.org/10.1007/BF02294774
Web of Science ®Google Scholar
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: SAGE.
Google Scholar
Singh, J. (2004). Tackling measurement problems with item response theory. Journal of Business Research, 57(2), 184–208. https://doi.org/10.1016/S0148-2963(01)00302-2
Web of Science ®Google Scholar
Skevington, S. M., Lotfy, M., & O’Connell, K. A. (2004). The World Health Organization’s WHOQOL-BREF quality of life assessment: Psychometric properties and results of the international field trial. A report from the WHOQOL group. Quality of Life Research, 13(2), 299–310. https://doi.org/10.1023/B:QURE.0000018486.91360.00
PubMed Web of Science ®Google Scholar
Stochl, J., Jones, P. B., & Croudace, T. J. (2012). Mokken scale analysis of mental health and well-being questionnaire item responses: A non-parametric IRT method in empirical research for applied health researchers. BMC Medical Research Methodology, 12, 74. https://doi.org/10.1186/1471-2288-12-74
PubMed Web of Science ®Google Scholar
Straat, J. H., van der Ark, L. A., & Sijtsma, K. (2014). Minimum sample size requirements for Mokken scale analysis. Educational and Psychological Measurement, 74(5), 809–822. https://doi.org/10.1177/0013164414529793
Web of Science ®Google Scholar
Stroud, M. W., McKnight, P. E., & Jensen, M. P. (2004). Assessment of self-reported physical activity in patients with chronic pain: Development of an abbreviated roland-morris disability scale. The Journal of Pain, 5(5), 257–263. https://doi.org/10.1016/j.jpain.2004.04.002
PubMed Web of Science ®Google Scholar
Torfs, P., & Brauer, C. (2014, March 3). A (very) short introduction to R. Retrieved from https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf
Google Scholar
Watson, R., van der Ark, L. A., Lin, L.-C., Fieo, R., Deary, I. J., & Meijer, R. R. (2012). Item response theory: How Mokken scaling can be used in clinical practice. Journal of Clinical Nursing, 21(19pt20), 2736–2746. https://doi.org/10.1111/j.1365-2702.2011.03893.x
PubMedGoogle Scholar

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Scale validation in applied health research: tutorial for a 6-step R-based psychometrics protocol

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Scale validation in applied health research: tutorial for a 6-step R-based psychometrics protocol

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date