28
Views
0
CrossRef citations to date
0
Altmetric
Methodological Studies

When Should Evaluators Lose Sleep Over Measurement? Toward Establishing Best Practices

, &
Received 07 Nov 2022, Accepted 03 Apr 2024, Published online: 08 May 2024
 

Abstract

Evaluators often invest much effort into designing evaluation studies. However, there is evidence that less attention is paid to measurement. One possible explanation is that focus in applied psychometrics is on reliability, with less placed on measurement model misspecification and the bias it can introduce into estimates that use resultant scores. Another possible explanation is that evaluators frequently want to use the simplest scoring approach possible under the assumption that it is transparent and therefore relies on fewer assumptions—a mindset that is often, if not always, misguided. In this study, we walk through the decisions involved in producing scores for program evaluation studies in an attempt to demystify the psychometrics, as well as show how related decisions can be consequential. We use Monte Carlo simulations to illustrate the effects of those decisions in a randomized control trial, then show that these decisions can impact published evaluation results. Finally, we try to give evaluators best practices in scoring for evaluation, including understanding when deviating from those practices is most likely to impact their work.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes

1 An important caveat is that testing companies often encourage the use of scores to do aggregate subgroup comparisons within schools and districts, which can be considered non-randomized control-treatment contrasts.

2 Some sort of link function like a logit link would also be required.

3 Even though IRT and SEM models incorporate stochastic terms to reflect item-level variation that is unrelated to the latent construct, note that these terms reflect more than just random error. They capture all item-specific variation (both random noise, as well as systematic variation that is due to the unique aspects of an item) and perhaps even information that may be common across subsets of items in violation of the local independence assumption.

4 When using standardized loadings, one would be constraining the loadings equal, not necessarily to one.

5 This equation is the SEM formulation, but one can convert it to IRT straightforwardly (Wirth & Edwards, Citation2007).

6 Here, “basic” is defined as a simple unidimensional model like the one in Equation 1.

7 Technically, there are also important decisions related to parameter estimation and the type of IRT model, such as whether to use a graded response versus partial credit model. For more discussion of estimation issues, see Wirth and Edwards (Citation2007).

8 Note that, if the RCT is done perfectly, then means/variances for control/treat at Time 1 would be identical even if freely estimated for the treatment group. Assuming an intervention effect, the true would not be so post-intervention.

9 According to Vector Psychometric Group (Citation2024), MLE is not recommended for use with multidimensional models because it does not use information from the population distribution, which can cause convergence issues in score estimation, and undefined standard errors.

10 The specification of this model can be found in the supplemental materials.

11 Complete estimates can be found in SOM Table C1.

12 one of four conditions: universal intervention, selective intervention, combined (universal and selective) intervention, and no-intervention control.

13 Note this is a 90% CI to use the least stringent threshold for significance.

14 Technically, Whitney and Candelaria (Citation2017) used mean scores, not total scores.

15 In more technical terms, that includes items with high loadings/discrimination parameters.

16 Note that this model imposes a proportional odds assumption because each item has only a single slope parameter, despite having multiple threshold parameters.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 302.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.