107
Views
0
CrossRef citations to date
0
Altmetric
Methodological Studies

When Should Evaluators Lose Sleep Over Measurement? Toward Establishing Best Practices

, &
Received 07 Nov 2022, Accepted 03 Apr 2024, Published online: 08 May 2024

References

  • Ackerman, B. P. (1986). Referential and causal coherence in the story comprehension of children and adults. Journal of Experimental Child Psychology, 41(2), 336–366. https://doi.org/10.1016/0022-0965(86)90044-5
  • Ahmed, I., Bertling, M., Zhang, L., Ho, A. D., Loyalka, P., Xue, H., Rozelle, S., & Domingue, B. W. (2023). Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials. (EdWorkingPaper: 23–754). Retrieved from Annenberg Institute at Brown University. https://doi.org/10.26300/1nw4-na96
  • Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
  • Bauer, D. J., & Curran, P. J. (2016). The discrepancy between measurement and modeling in longitudinal data analysis. In J. R. Harring, L. M. Stapleton, & S. N. Beretvas (Eds.), Advances in multilevel modeling for educational research (pp. 3–38). Information Age Publishing.
  • Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
  • Britton, B. K., & Gülgöz, S. (1991). Using Kintsch’s computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures. Journal of Educational Psychology, 83(3), 329–345. https://doi.org/10.1037/0022-0663.83.3.329
  • Cai, L., & Houts, C. R. (2021). Longitudinal analysis of patient-reported outcomes in clinical trials: Applications of multilevel and multidimensional item response theory. Psychometrika, 86(3), 754–777. https://doi.org/10.1007/s11336-021-09777-y
  • Cheung, A. C., & Slavin, R. E. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45(5), 283–292. https://doi.org/10.3102/0013189X16656615
  • Clearinghouse. (2012). What works clearinghouse. http://ies.ed.gov/ncee/wwc
  • Duckworth, A. L., & Yeager, D. S. (2015). Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher (Washington, D.C.: 1972), 44(4), 237–251. https://doi.org/10.3102/0013189X15584327
  • Durlak, J. A., Weissberg, R. P., Dymnicki, A. B., Taylor, R. D., & Schellinger, K. B. (2011). The impact of enhancing students’ social and emotional learning: A meta-analysis of school-based universal interventions. Child Development, 82(1), 405–432. https://doi.org/10.1111/j.1467-8624.2010.01564.x
  • Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
  • Fried, E. I., van Borkulo, C. D., Epskamp, S., Schoevers, R. A., Tuerlinckx, F., & Borsboom, D. (2016). Measuring depression over time… Or not? Lack of unidimensionality and longitudinal measurement invariance in four common rating scales of depression. Psychological Assessment, 28(11), 1354–1367. https://doi.org/10.1037/pas0000275
  • Gehlbach, H., & Hough, H. J. (2018). Measuring Social Emotional Learning through Student Surveys in the CORE Districts: A Pragmatic Approach to Validity and Reliability. Policy Analysis for California Education, PACE.
  • Gorman-Smith, D., Henry, D. B., & Tolan, P. H. (2004). Exposure to community violence and violence perpetration: The protective effects of family functioning. Journal of Clinical Child and Adolescent Psychology: The Official Journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division 53, 33(3), 439–449. https://doi.org/10.1207/s15374424jccp3303_2
  • Gorter, R., Fox, J. P., Apeldoorn, A., & Twisk, J. (2016). Measurement model choice influenced randomized controlled trial results. Journal of Clinical Epidemiology, 79, 140–149. https://doi.org/10.1016/j.jclinepi.2016.06.011
  • Gorter, R., Fox, J. P., Riet, G. T., Heymans, M. W., & Twisk, J. W. R. (2020). Latent growth modeling of IRT versus CTT measured longitudinal latent variables. Statistical Methods in Medical Research, 29(4), 962–986. https://doi.org/10.1177/0962280219856375
  • Henry, D. B., Cartland, J., Ruchross, H., & Monahan, K. (2004). A return potential measure of setting norms for aggression. American Journal of Community Psychology, 33(3-4), 131–149. https://doi.org/10.1023/b:ajcp.0000027001.71205.dd
  • Henry, D. B., Farrell, A. D., & Project, T. M. V. P, Multisite Violence Prevention Project. (2004). The study designed by a committee: Design of the Multisite Violence Prevention Project. American Journal of Preventive Medicine, 26(1 Suppl), 12–19. https://doi.org/10.1016/j.amepre.2003.09.027
  • Henry, D. B., Tolan, P. H., Gorman-Smith, D., & Schoeny, M. E. (2012). Risk and direct protective factors for youth violence: Results from the Centers for Disease Control and Prevention’s Multisite Violence Prevention Project. American Journal of Preventive Medicine, 43(2 Suppl 1), S67–S75. https://doi.org/10.1016/j.amepre.2012.04.025
  • Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334. https://doi.org/10.1037/a0020761
  • International Association for the Evaluation of Educational Achievement. (2011). PIRLS 2011 Student Questionnaire, Grade 4. Retrieved June 9, 2014, from https://timssandpirls.bc.edu/pirls2011/downloads/P11_StuQ.pdf
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(1), 109. https://doi.org/10.3389/fpsyg.2016.00109
  • Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798
  • Kuhfeld, M., & Soland, J. (2022). Avoiding bias from sum scores in growth estimates: An examination of IRT-based approaches to scoring longitudinal survey responses. Psychological Methods, 27(2), 234–260. https://doi.org/10.1037/met0000367
  • Kuhfeld, M., & Soland, J. (2023). Scoring assessments in multisite randomized control trials: Examining the sensitivity of treatment effect estimates to measurement choices. Psychological Methods, https://doi.org/10.1037/met0000633
  • Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research & Perspective, 11(3), 71–101. https://doi.org/10.1080/15366367.2013.831680
  • McNeish, D. (2023). Psychometric properties of sum scores and factor scores differ even when their correlation is 0.98: A response to Widaman and Revelle. Behavior Research Methods, 55(8), 4269–4290. http://dx.doi.org/10.3758/s13428-022-02016-x
  • McNeish, D., & Wolf, M. G. (2020). Thinking twice about sum scores. Behavior Research Methods, 52(6), 2287–2305. https://doi.org/10.3758/s13428-020-01398-0
  • Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825
  • Meyer, R. H., Wang, C., & Rice, A. B. (2018). Measuring Students’ Social-Emotional Learning among California’s CORE Districts: An IRT Modeling Approach. Working Paper. Policy Analysis for California Education, PACE.
  • Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196. https://doi.org/10.1007/BF02294457
  • Mislevy, R. J., & Sheehan, K. M. (1989). Information matrices in latent-variable models. Journal of Educational Statistics, 14(4), 335–350. https://doi.org/10.3102/10769986014004335
  • Oort, F. J. (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 14(3), 587–598. https://doi.org/10.1007/s11136-004-0830-y
  • Rhemtulla, M., van Bork, R., & Borsboom, D. (2020). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological Methods, 25(1), 30–45. https://doi.org/10.1037/met0000220
  • Rocconi, L. M., & Gonyea, R. M. (2018). Contextualizing effect sizes in the national survey of student engagement: An empirical analysis. Research & Practice in Assessment, 13, 22–38.
  • Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Education Science Theory and Practice, 17(1), 321–335.
  • Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62–87. https://doi.org/10.3102/1076998607302714
  • Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., Gundy, C., Koller, M., Petersen, M. A., … & Sprangers, M. A. G, Quality of Life Cross-Cultural Meta-Analysis Group. (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295. https://doi.org/10.1016/j.jclinepi.2008.06.003
  • Simon, T. R., Ikeda, R. M., Smith, E. P., Reese, L. E., Rabiner, D. L., Miller, S., Winn, D.-M., Dodge, K. A., Asher, S. R., Horne, A. M., Orpinas, P., Martin, R., Quinn, W. H., Tolan, P. H., Gorman-Smith, D., Henry, D. B., Gay, F. N., Schoeny, M., Farrell, A. D., … Allison, K. W, Multisite Violence Prevention Project. (2009). The ecological effects of universal and selective violence prevention programs for middle school students: A randomized trial. Journal of Consulting and Clinical Psychology, 77(3), 526–542. https://doi.org/10.1037/a0014395
  • Soland, J., Kuhfeld, M., Wolk, E., & Bi, S. (2019). Examining the state-trait composition of social-emotional learning constructs: Implications for practice, policy, and evaluation. Journal of Research on Educational Effectiveness, 12(3), 550–577. https://doi.org/10.1080/19345747.2019.1615158
  • Soland, J. (2022). Evidence that selecting an appropriate item response theory–based approach to scoring surveys can help avoid biased treatment effect estimates. Educational and Psychological Measurement, 82(2), 376–403. https://doi.org/10.1177/00131644211007551
  • Soland, J., Kuhfeld, M., & Edwards, K. (2022). How survey scoring decisions can influence your study’s results: A trip through the IRT looking glass. Psychological Methods, Advance online publication. https://doi.org/10.1037/met0000506
  • Soland, J. (2023). Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble). Journal of Research on Educational Effectiveness, 17(2), 391–421. https://doi.org/10.1080/19345747.2023.2195413
  • Soland, J., Johnson, A., & Talbert, E. (2023). Regression discontinuity designs in a latent variable framework. Psychological Methods, 28(3), 691–704. https://doi.org/10.1037/met0000453
  • Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In: Thissen D, Wainer H, eds. Test Scoring. ; 73–140. Lawrence Erlbaum.
  • Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397–412. https://doi.org/10.1007/BF02293705
  • Thomas, N. (2002). The role of secondary covariates when estimating latent trait population distributions. Psychometrika, 67(1), 33–48. https://doi.org/10.1007/BF02294708
  • Tingley, D., Yamamoto, T., Hirose, K., Keele, L., & Imai, K. (2014). Mediation: R package for causal mediation analysis. Journal of Statistical Software, 59(5), 1–38. https://doi.org/10.18637/jss.v059.i05
  • Tolan, P., Gorman-Smith, D., & Henry, D. (2004). Supporting Families in a High-Risk Setting: Proximal Effects of the SAFEChildren Preventive Intervention. Journal of Consulting and Clinical Psychology, 72(5), 855–869. https://doi.org/10.1037/0022-006X.72.5.855
  • U.S. Department of Education Office of Civil Rights. (2021). Education in a pandemic: The disparate impacts of COVID-19 on America’s students. Retrieved April 10, 2022, from https://www2.ed.gov/about/offices/list/ocr/docs/20210608-impacts-of-covid19.pdf.
  • Vector Psychometric Group. (2024). flexMIRT frequently asked questions. https://vpgcentral.com/software/flexmirt/support/frequently-asked-questions/
  • von Davier, M., Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful. IERI Monograph Series, 2, 9–36.
  • von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions. Handbook of Statistics, 26, 1039–1055.
  • von Davier, M., & Sinharay, S. (2013). Analytics in international large-scale assessments: Item response theory and population models. In Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 155–174). CRC Press.
  • Whitney, C. R., & Candelaria, C. A. (2017). The effects of No Child Left Behind on children’s socioemotional outcomes. AERA Open, 3(3), 233285841772632. https://doi.org/10.1177/2332858417726324
  • Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: current approaches and future directions. Psychological Methods, 12(1), 58–79. https://doi.org/10.1037/1082-989X.12.1.58
  • Wolf, R. (2021). Average Differences in Effect Sizes by Outcome Measure Type. What Works Clearinghouse Working Paper. What Works Clearinghouse.
  • Wolf, B., & Harbatkin, E. (2023). Making sense of effect sizes: Systematic differences in intervention effect sizes by outcome measure type. Journal of Research on Educational Effectiveness, 16(1), 134–161. https://doi.org/10.1080/19345747.2022.2071364
  • Yeager, D. S., & Walton, G. M. (2011). Social-psychological interventions in education: They’re not magic. Review of Educational Research, 81(2), 267–301. https://doi.org/10.3102/0034654311405999

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.