Search in:

Advanced search

Journal of Research on Educational Effectiveness Latest Articles

Submit an article Journal homepage

107

Views

CrossRef citations to date

Altmetric

Methodological Studies

When Should Evaluators Lose Sleep Over Measurement? Toward Establishing Best Practices

James SolandUniversity of Virginia School of Education and Human Development, Charlottesville, Virginia, USACorrespondence[email protected]

Kelly Edwards

Eli Talbert

Received 07 Nov 2022, Accepted 03 Apr 2024, Published online: 08 May 2024

Cite this article
https://doi.org/10.1080/19345747.2024.2344011
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Ackerman, B. P. (1986). Referential and causal coherence in the story comprehension of children and adults. Journal of Experimental Child Psychology, 41(2), 336–366. https://doi.org/10.1016/0022-0965(86)90044-5
Web of Science ®Google Scholar
Ahmed, I., Bertling, M., Zhang, L., Ho, A. D., Loyalka, P., Xue, H., Rozelle, S., & Domingue, B. W. (2023). Heterogeneity of item-treatment interactions masks complexity and generalizability in randomized controlled trials. (EdWorkingPaper: 23–754). Retrieved from Annenberg Institute at Brown University. https://doi.org/10.26300/1nw4-na96
Google Scholar
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
Web of Science ®Google Scholar
Bauer, D. J., & Curran, P. J. (2016). The discrepancy between measurement and modeling in longitudinal data analysis. In J. R. Harring, L. M. Stapleton, & S. N. Beretvas (Eds.), Advances in multilevel modeling for educational research (pp. 3–38). Information Age Publishing.
Google Scholar
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
Web of Science ®Google Scholar
Britton, B. K., & Gülgöz, S. (1991). Using Kintsch’s computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures. Journal of Educational Psychology, 83(3), 329–345. https://doi.org/10.1037/0022-0663.83.3.329
Web of Science ®Google Scholar
Cai, L., & Houts, C. R. (2021). Longitudinal analysis of patient-reported outcomes in clinical trials: Applications of multilevel and multidimensional item response theory. Psychometrika, 86(3), 754–777. https://doi.org/10.1007/s11336-021-09777-y
PubMed Web of Science ®Google Scholar
Cheung, A. C., & Slavin, R. E. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45(5), 283–292. https://doi.org/10.3102/0013189X16656615
Web of Science ®Google Scholar
Clearinghouse. (2012). What works clearinghouse. http://ies.ed.gov/ncee/wwc
Google Scholar
Duckworth, A. L., & Yeager, D. S. (2015). Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher (Washington, D.C.: 1972), 44(4), 237–251. https://doi.org/10.3102/0013189X15584327
PubMed Web of Science ®Google Scholar
Durlak, J. A., Weissberg, R. P., Dymnicki, A. B., Taylor, R. D., & Schellinger, K. B. (2011). The impact of enhancing students’ social and emotional learning: A meta-analysis of school-based universal interventions. Child Development, 82(1), 405–432. https://doi.org/10.1111/j.1467-8624.2010.01564.x
PubMed Web of Science ®Google Scholar
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
Web of Science ®Google Scholar
Fried, E. I., van Borkulo, C. D., Epskamp, S., Schoevers, R. A., Tuerlinckx, F., & Borsboom, D. (2016). Measuring depression over time… Or not? Lack of unidimensionality and longitudinal measurement invariance in four common rating scales of depression. Psychological Assessment, 28(11), 1354–1367. https://doi.org/10.1037/pas0000275
PubMed Web of Science ®Google Scholar
Gehlbach, H., & Hough, H. J. (2018). Measuring Social Emotional Learning through Student Surveys in the CORE Districts: A Pragmatic Approach to Validity and Reliability. Policy Analysis for California Education, PACE.
Google Scholar
Gorman-Smith, D., Henry, D. B., & Tolan, P. H. (2004). Exposure to community violence and violence perpetration: The protective effects of family functioning. Journal of Clinical Child and Adolescent Psychology: The Official Journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division 53, 33(3), 439–449. https://doi.org/10.1207/s15374424jccp3303_2
PubMed Web of Science ®Google Scholar
Gorter, R., Fox, J. P., Apeldoorn, A., & Twisk, J. (2016). Measurement model choice influenced randomized controlled trial results. Journal of Clinical Epidemiology, 79, 140–149. https://doi.org/10.1016/j.jclinepi.2016.06.011
PubMed Web of Science ®Google Scholar
Gorter, R., Fox, J. P., Riet, G. T., Heymans, M. W., & Twisk, J. W. R. (2020). Latent growth modeling of IRT versus CTT measured longitudinal latent variables. Statistical Methods in Medical Research, 29(4), 962–986. https://doi.org/10.1177/0962280219856375
PubMed Web of Science ®Google Scholar
Henry, D. B., Cartland, J., Ruchross, H., & Monahan, K. (2004). A return potential measure of setting norms for aggression. American Journal of Community Psychology, 33(3-4), 131–149. https://doi.org/10.1023/b:ajcp.0000027001.71205.dd
PubMed Web of Science ®Google Scholar
Henry, D. B., Farrell, A. D., & Project, T. M. V. P, Multisite Violence Prevention Project. (2004). The study designed by a committee: Design of the Multisite Violence Prevention Project. American Journal of Preventive Medicine, 26(1 Suppl), 12–19. https://doi.org/10.1016/j.amepre.2003.09.027
PubMed Web of Science ®Google Scholar
Henry, D. B., Tolan, P. H., Gorman-Smith, D., & Schoeny, M. E. (2012). Risk and direct protective factors for youth violence: Results from the Centers for Disease Control and Prevention’s Multisite Violence Prevention Project. American Journal of Preventive Medicine, 43(2 Suppl 1), S67–S75. https://doi.org/10.1016/j.amepre.2012.04.025
PubMed Web of Science ®Google Scholar
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334. https://doi.org/10.1037/a0020761
PubMed Web of Science ®Google Scholar
International Association for the Evaluation of Educational Achievement. (2011). PIRLS 2011 Student Questionnaire, Grade 4. Retrieved June 9, 2014, from https://timssandpirls.bc.edu/pirls2011/downloads/P11_StuQ.pdf
Google Scholar
Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(1), 109. https://doi.org/10.3389/fpsyg.2016.00109
PubMed Web of Science ®Google Scholar
Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798
Web of Science ®Google Scholar
Kuhfeld, M., & Soland, J. (2022). Avoiding bias from sum scores in growth estimates: An examination of IRT-based approaches to scoring longitudinal survey responses. Psychological Methods, 27(2), 234–260. https://doi.org/10.1037/met0000367
PubMed Web of Science ®Google Scholar
Kuhfeld, M., & Soland, J. (2023). Scoring assessments in multisite randomized control trials: Examining the sensitivity of treatment effect estimates to measurement choices. Psychological Methods, https://doi.org/10.1037/met0000633
PubMed Web of Science ®Google Scholar
Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research & Perspective, 11(3), 71–101. https://doi.org/10.1080/15366367.2013.831680
Google Scholar
McNeish, D. (2023). Psychometric properties of sum scores and factor scores differ even when their correlation is 0.98: A response to Widaman and Revelle. Behavior Research Methods, 55(8), 4269–4290. http://dx.doi.org/10.3758/s13428-022-02016-x
Google Scholar
McNeish, D., & Wolf, M. G. (2020). Thinking twice about sum scores. Behavior Research Methods, 52(6), 2287–2305. https://doi.org/10.3758/s13428-020-01398-0
PubMed Web of Science ®Google Scholar
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825
Web of Science ®Google Scholar
Meyer, R. H., Wang, C., & Rice, A. B. (2018). Measuring Students’ Social-Emotional Learning among California’s CORE Districts: An IRT Modeling Approach. Working Paper. Policy Analysis for California Education, PACE.
Google Scholar
Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196. https://doi.org/10.1007/BF02294457
Web of Science ®Google Scholar
Mislevy, R. J., & Sheehan, K. M. (1989). Information matrices in latent-variable models. Journal of Educational Statistics, 14(4), 335–350. https://doi.org/10.3102/10769986014004335
Google Scholar
Oort, F. J. (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 14(3), 587–598. https://doi.org/10.1007/s11136-004-0830-y
PubMed Web of Science ®Google Scholar
Rhemtulla, M., van Bork, R., & Borsboom, D. (2020). Worse than measurement error: Consequences of inappropriate latent variable measurement models. Psychological Methods, 25(1), 30–45. https://doi.org/10.1037/met0000220
PubMed Web of Science ®Google Scholar
Rocconi, L. M., & Gonyea, R. M. (2018). Contextualizing effect sizes in the national survey of student engagement: An empirical analysis. Research & Practice in Assessment, 13, 22–38.
Google Scholar
Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Education Science Theory and Practice, 17(1), 321–335.
Google Scholar
Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62–87. https://doi.org/10.3102/1076998607302714
Web of Science ®Google Scholar
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., Gundy, C., Koller, M., Petersen, M. A., … & Sprangers, M. A. G, Quality of Life Cross-Cultural Meta-Analysis Group. (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295. https://doi.org/10.1016/j.jclinepi.2008.06.003
PubMed Web of Science ®Google Scholar
Simon, T. R., Ikeda, R. M., Smith, E. P., Reese, L. E., Rabiner, D. L., Miller, S., Winn, D.-M., Dodge, K. A., Asher, S. R., Horne, A. M., Orpinas, P., Martin, R., Quinn, W. H., Tolan, P. H., Gorman-Smith, D., Henry, D. B., Gay, F. N., Schoeny, M., Farrell, A. D., … Allison, K. W, Multisite Violence Prevention Project. (2009). The ecological effects of universal and selective violence prevention programs for middle school students: A randomized trial. Journal of Consulting and Clinical Psychology, 77(3), 526–542. https://doi.org/10.1037/a0014395
PubMed Web of Science ®Google Scholar
Soland, J., Kuhfeld, M., Wolk, E., & Bi, S. (2019). Examining the state-trait composition of social-emotional learning constructs: Implications for practice, policy, and evaluation. Journal of Research on Educational Effectiveness, 12(3), 550–577. https://doi.org/10.1080/19345747.2019.1615158
Web of Science ®Google Scholar
Soland, J. (2022). Evidence that selecting an appropriate item response theory–based approach to scoring surveys can help avoid biased treatment effect estimates. Educational and Psychological Measurement, 82(2), 376–403. https://doi.org/10.1177/00131644211007551
PubMed Web of Science ®Google Scholar
Soland, J., Kuhfeld, M., & Edwards, K. (2022). How survey scoring decisions can influence your study’s results: A trip through the IRT looking glass. Psychological Methods, Advance online publication. https://doi.org/10.1037/met0000506
Google Scholar
Soland, J. (2023). Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble). Journal of Research on Educational Effectiveness, 17(2), 391–421. https://doi.org/10.1080/19345747.2023.2195413
Web of Science ®Google Scholar
Soland, J., Johnson, A., & Talbert, E. (2023). Regression discontinuity designs in a latent variable framework. Psychological Methods, 28(3), 691–704. https://doi.org/10.1037/met0000453
PubMed Web of Science ®Google Scholar
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In: Thissen D, Wainer H, eds. Test Scoring. ; 73–140. Lawrence Erlbaum.
Google Scholar
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397–412. https://doi.org/10.1007/BF02293705
Web of Science ®Google Scholar
Thomas, N. (2002). The role of secondary covariates when estimating latent trait population distributions. Psychometrika, 67(1), 33–48. https://doi.org/10.1007/BF02294708
Web of Science ®Google Scholar
Tingley, D., Yamamoto, T., Hirose, K., Keele, L., & Imai, K. (2014). Mediation: R package for causal mediation analysis. Journal of Statistical Software, 59(5), 1–38. https://doi.org/10.18637/jss.v059.i05
PubMed Web of Science ®Google Scholar
Tolan, P., Gorman-Smith, D., & Henry, D. (2004). Supporting Families in a High-Risk Setting: Proximal Effects of the SAFEChildren Preventive Intervention. Journal of Consulting and Clinical Psychology, 72(5), 855–869. https://doi.org/10.1037/0022-006X.72.5.855
PubMed Web of Science ®Google Scholar
U.S. Department of Education Office of Civil Rights. (2021). Education in a pandemic: The disparate impacts of COVID-19 on America’s students. Retrieved April 10, 2022, from https://www2.ed.gov/about/offices/list/ocr/docs/20210608-impacts-of-covid19.pdf.
Google Scholar
Vector Psychometric Group. (2024). flexMIRT frequently asked questions. https://vpgcentral.com/software/flexmirt/support/frequently-asked-questions/
Google Scholar
von Davier, M., Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful. IERI Monograph Series, 2, 9–36.
Google Scholar
von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions. Handbook of Statistics, 26, 1039–1055.
Google Scholar
von Davier, M., & Sinharay, S. (2013). Analytics in international large-scale assessments: Item response theory and population models. In Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 155–174). CRC Press.
Google Scholar
Whitney, C. R., & Candelaria, C. A. (2017). The effects of No Child Left Behind on children’s socioemotional outcomes. AERA Open, 3(3), 233285841772632. https://doi.org/10.1177/2332858417726324
Web of Science ®Google Scholar
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: current approaches and future directions. Psychological Methods, 12(1), 58–79. https://doi.org/10.1037/1082-989X.12.1.58
PubMed Web of Science ®Google Scholar
Wolf, R. (2021). Average Differences in Effect Sizes by Outcome Measure Type. What Works Clearinghouse Working Paper. What Works Clearinghouse.
Google Scholar
Wolf, B., & Harbatkin, E. (2023). Making sense of effect sizes: Systematic differences in intervention effect sizes by outcome measure type. Journal of Research on Educational Effectiveness, 16(1), 134–161. https://doi.org/10.1080/19345747.2022.2071364
Web of Science ®Google Scholar
Yeager, D. S., & Walton, G. M. (2011). Social-psychological interventions in education: They’re not magic. Review of Educational Research, 81(2), 267–301. https://doi.org/10.3102/0034654311405999
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

When Should Evaluators Lose Sleep Over Measurement? Toward Establishing Best Practices

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

When Should Evaluators Lose Sleep Over Measurement? Toward Establishing Best Practices

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date