489
Views
5
CrossRef citations to date
0
Altmetric
Articles

Mean and Variance Corrected Test Statistics for Structural Equation Modeling with Many Variables

Pages 827-846 | Published online: 02 May 2019
 

Abstract

Data in social and behavioral sciences are routinely collected using questionnaires, and each domain of interest is tapped by multiple indicators. Structural equation modeling (SEM) is one of the most widely used methods to analyze such data. However, conventional methods for SEM face difficulty when the number of variables (p) is large even when the sample size (N) is also rather large. This article addresses the issue of model inference with the likelihood ratio statistic Tml. Using the method of empirical modeling, mean-and-variance corrected statistics for SEM with many variables are developed. Results show that the new statistics not only perform much better than Tml but also are substantial improvements over other corrections to Tml. When combined with a robust transformation, the new statistics also perform well with non-normally distributed data.

Notes

1 While there are other types of big data, we will refer big data as samples with large p and small N in this article for simplicity.

2 Lasso regression is not used in our selection of predictors because variables with smaller regression coefficients have been repeatedly found to be more effective in reducing the sum of squares of residuals than variables with larger coefficients.

3 Note that Tˉml and s2 are not literally independent. However, the distribution of χdf2 is approximately symmetric when df20 (see Figure 11.1 of Forbes et al., Citation2011, p. 70), and then the correlation of ei1 and ei2 becomes tiny. With many variables, the degrees of freedom for typical SEM models will be much greater than 20, as for the conditions described in the following section. Thus, instead of using generalized least squares, we will just use WLS to estimate the model in Equation 6, by ignoring the correlation of ei1 and ei2.

4 The number of best subsets might be arbitrary, but our experience indicates that the additional gain becomes minimal as we select more best-subsets. Also, best-subset regression becomes less effective with too many variables included in the following step when their product terms are included.

5 The table does not contain the information of s2, which was kindly provided by Dr. Dexin Shi via personal communication.

6 The option “model y = v1-v10/selection = maxR; weight w;” under Proc Reg allows us to select the best predictors from v1 to v10 according to weighted least squares.

7 For the statistic Tml, 92.6% of rw1 and 46.18% of rw2 are greater than 1.96, respectively. By contrast, 0.0% of rw1 and .54% of rw2 are smaller than −1.96, respectively.

8 For a p×p non-negative definite matrix A, its power can be obtained by Ac=VUcV, where V=(v1,v2,,vp) with vj being the jth eigenvector of A corresponding to the jth eigenvalue bj, and Uc is the diagonal matrix with jth diagonal element being given by uj to the power of c.

9 For a sample from a normally distributed population, the squared Mahalanobis distance di2 approximately follow χp2.

Additional information

Funding

The research was supported by the National Science Foundation under Grant No. SES-1461355; Division of Social and Economic Sciences.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 412.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.