ABSTRACT
The purpose of this study was to apply a set of rarely reported psychometric indices that, nevertheless, are important to consider when evaluating psychological measures. All can be derived from a standardized loading matrix in a confirmatory bifactor model: omega reliability coefficients, factor determinacy, construct replicability, explained common variance, and percentage of uncontaminated correlations. We calculated these indices and extended the findings of 50 recent bifactor model estimation studies published in psychopathology, personality, and assessment journals. These bifactor derived indices (most not presented in the articles) provided a clearer and more complete picture of the psychometric properties of the assessment instruments. We reached 2 firm conclusions. First, although all measures had been tagged “multidimensional,” unit-weighted total scores overwhelmingly reflected variance due to a single latent variable. Second, unit-weighted subscale scores often have ambiguous interpretations because their variance mostly reflects the general, not the specific, trait. Finally, we review the implications of our evaluations and consider the limits of inferences drawn from a bifactor modeling approach.
Notes
1 Named “construct reliability” in the original work; of late, however, construct replicability appears to be the preferred descriptor.
2 Note that unit-weighted summed scores are, in a sense, unsophisticated factor score estimates (Grice, Citation2001).
3 Factor determinacy and construct replicability have quite different intellectual histories. They, however, represent two approaches to the same psychometric issue. Moreover, when the data are unidimensional, factor determinacy squared and construct reliability are equivalent. In the case of a bifactor model, the values might differ. We do not advance arguments for favoring one over the other, however.
4 If the H value is low (<.80), then using unit-weighted item scores (i.e., not optimally-weighted scores) to reflect the underlying latent variable can only produce even less replicable results.
5 Each of these models had been considered “well fitting.” These H and FD results for group factors serve as yet another reminder that “good fit” does not equate to “quality model”—fitting and quality are two different things.
6 Space limits prevent lengthy consideration of the more pragmatic solution of forming parcels based on content domains. We must note, however, that Cronbach Citation(1951) also suggested that researchers aggregate within subdomains and then compute alpha among the subdomain scores to achieve a cleaner estimate of general factor saturation.
7 Indeed, this is exactly the rationale that many confirmatory factor-analytic researchers use to warn of the multidimensional nature of a given scale and the corresponding requirement to only interpret subscale scores.
8 Much the same would be concluded if an analyst fit a two correlated factors model to these 18 items. The two latent variables would appear well measured, albeit with the correlation between factors confounding interpretation.