793
Views
16
CrossRef citations to date
0
Altmetric
Original Articles

Random Permutation Testing Applied to Measurement Invariance Testing with Ordered-Categorical Indicators

, &
Pages 573-587 | Published online: 24 Jan 2018
 

Abstract

We describe and evaluate a random permutation test of measurement invariance with ordered-categorical data. To calculate a p-value for the observed (∆)χ2, an empirical reference distribution is built by repeatedly shuffling the grouping variable, then saving the χ2 from a configural model, or the ∆χ2 between configural and scalar-invariance models, fitted to each permuted dataset. The current gold standard in this context is a robust mean- and variance-adjusted ∆χ2 test proposed by Satorra (2000), which yields inflated Type I errors, particularly when thresholds are asymmetric, unless samples sizes are quite large (Bandalos, 2014; Sass et al., 2014). In a Monte Carlo simulation, we compare permutation to three implementations of Satorra’s robust χ2 across a variety of conditions evaluating configural and scalar invariance. Results suggest permutation can better control Type I error rates while providing comparable power under conditions that the standard robust test yields inflated errors.

Acknowledgment

We would like to thank Yves Rosseel for his helpful technical discussions while investigating different implementations of the mean- and variance-adjusted test statistic, and Paul Johnson for his computational assistance while comparing software packages. We thank the Center for Research Methods and Data Analysis and the College of Liberal Sciences at the University of Kansas for access to their high performance compute cluster on which our Monte Carlo simulations were conducted.

Notes

1 Throughout the manuscript, we will restrain our discussion to the case of polychoric correlations for models fit only to ordered-categorical items, but this WLS estimator can also be applied to a mixture of discrete and continuous indicators. When continuous indicators are included, their observed (co)variances are included in the estimated polychoric correlation matrix, and polyserial correlations are estimated between the discrete and continuous indicators.

2 Mean- and variance-adjusted statistics can also be calculated for other estimators, such as maximum likelihood.

3 Note that it is not appropriate to calculate the difference between two statistics because they will not be approximately distributed. Instead, the difference between unadjusted statistics must be calculated, then adjusted.

4 Details about how to use the DIFFTEST command can be found with Web Note 4 at http://www.statmodel.com/ .

5 Jorgensen et al. (Citation2017a) showed that the test of overall model fit tests an overly restrictive null hypothesis because model configurations could be equivalent across populations even if the hypothesized model is not a perfectly accurate representation of it. This issue is discussed elsewhere in greater detail (Jorgensen, Citation2017; Jorgensen et al., Citation2017), but it is beyond the focus of the current study, which focuses on situations in which the test fails even in the ideal circumstance that the hypothesized model is a perfect representation of the population(s).

6 Jorgensen et al. (Citation2017a) showed that permuting alternative fit indices also provides valid tests of hypotheses about measurement invariance.

7 Wu and Estabrook (Citation2016) recently showed that it is not possible to test equality of thresholds independently of any other type of measurement parameter. It is only possible to test equality of thresholds on the condition of at least one other type of measurement parameter (for items with four or more categories), at least two other types (for items with three categories), or at least three other types (for binary items). This finding has implications for how measurement invariance should be tested with ordered-categorical indicators, but such a paradigm shift is beyond the scope of the current article.

8 Appendix A also discusses the issue of sparse data, when not all levels of a variable are observed in each group.

9 The application of the permutation method to incomplete data is a topic for future research that is beyond the scope of the current investigation.

10 Jorgensen (Citation2017) discussed modifying configurally invariant models with inadequate fit.

11 If we had fixed the factor means and variances in both groups even in the scalar model, as Sass et al. (Citation2014) did, these differences would have been = 16 and 40, respectively, as Sass et al. (Citation2014) reported. We discuss the implications of this difference in the Discussion section.

12 If software is flexible enough (e.g., general Bayesian modeling software, or more flexible SEM software like OpenMx), it is possible to fit a model to each group that estimates only the thresholds between categories that were observed within each group. Equality constraints could still be imposed on loadings and the thresholds the researcher knows correspond to categories on the same response scale used in each group.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 412.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.