Abstract
A challenge facing nearly all studies in the psychological sciences is how to best combine multiple items into a valid and reliable score to be used in subsequent modeling. The most ubiquitous method is to compute a mean of items, but more contemporary approaches use various forms of latent score estimation. Regardless of approach, outside of large-scale testing applications, scoring models rarely include background characteristics to improve score quality. This article used a Monte Carlo simulation design to study score quality for different psychometric models that did and did not include covariates across levels of sample size, number of items, and degree of measurement invariance. The inclusion of covariates improved score quality for nearly all design factors, and in no case did the covariates degrade score quality relative to not considering the influences at all. Results suggest that the inclusion of observed covariates can improve factor score estimation.
FUNDING
This research was supported by R01DA034636 (Daniel J. Bauer, Principal Investigator).
Notes
1 An important exception to this is clearly evident in the field of plausible values (e.g., Mislevy, Citation1991; Mislevy, Beaton, Kaplan, & Sheehan, Citation1992). Although exogenous covariates are commonly used in large-scale testing applications such as National Assessment of Educational Progress (e.g., Mislevy, Johnson, & Muraki, Citation1992), these applications are characterized by extremely large sample sizes and planned missing designs, neither of which characterizes the vast majority of typical scoring applications within the social sciences.
2 Item communalities were computed as follows: If, for each binary item, there is a continuous latent response that produces a binary observed value of zero or one if it falls below or above a fixed threshold, then the communality value represents the proportion of variance in the continuous latent response due to the common latent factor. These communality values are thus directly comparable to those commonly reported for linear factor analyses.
3 This is the typical method for setting the metric of the latent factor via standardization, but here we scaled the mean and variance conditioned on the covariates; see Bauer (Citationin press) for further details.
4 To maintain scope and focus, we do not present the vast corpus of results related to parameter recovery within the MNLFA scoring models themselves (e.g., factor loadings, covariate effects, etc.). Importantly, the sampling distributions of parameter estimates from the scoring models are precisely what would be expected from theory (e.g., higher precision with larger sample size, greater bias with model misspecification).
5 We observed the expected reduction in variability in which larger sample size was associated with lower within-cell variance, but there were no differences in the cell-specific means as a function of sample size.