Assessing Dimensionality in Non-Positive Definite Tetrachoric Correlation Matrices: Does Matrix Smoothing Help?: Multivariate Behavioral Research: Vol 57 , No 2-3

Abstract

We performed two simulation studies that investigated dimensionality recovery in NPD tetrachoric correlation matrices using parallel analysis. In each study, the NPD matrices were rehabilitated by three smoothing algorithms. In Study 1, we replicated the work by Debelak and Tran on the assessment of dimensionality in one- or two-dimensional common factor models. In Study 2, we extended the Debelak and Tran design in three important ways. Specifically, we investigated: (a) a wider range of factors; (b) models with varying amounts of model error; and (c) models generated from more realistic population item parameters. Our results indicated that matrix smoothing of NPD tetrachoric correlation matrices improves the performance of parallel analysis with binary data. However, these improvements were modest and often of trivial size. To demonstrate the effect of matrix smoothing on an empirical data set, we applied parallel analysis and factor analysis to Adjective Checklist data from the California Twin Registry.

Keywords:

Article information

Conflict of interest disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.

Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was not supported by any grant.

Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The authors would like to thank the anonymous reviewers for their comments on prior versions of this manuscript, as well as Rudolf Debelak and Ulrich Tran for providing R code and additional details that were necessary for the replication of Debelak and Tran (Citation2013). The authors also acknowledge the Minnesota Supercomputing Institute (MSI; http://www.msi.umn.edu) and Liberal Arts Technologies and Innovation Services (LATIS; https://cla.umn.edu/latis) at the University of Minnesota for providing resources that contributed to the research results reported within this paper. The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors' institutions is not intended and should not be inferred.

Notes

1 We use the term non-positive definite to refer only to matrices with both positive and negative eigenvalues. The set of non-positive definite matrices does not include negative semidefinite matrices (having all non-positive eigenvalues).

2 Wothke (Citation1993) provides a clear explanation of the underlying geometry for why “correlation” matrices can have negative eigenvalues.

3 Tables and figures that appear in the online supplement are indicated by the “S” prefix (e.g., Table S1, Figure S2).

4 Because simFA() requires item parameters to be specified in the factor loading and item threshold metrics, we calculated factor loadings from the item discrimination parameters via $f_{i} = (a_{i} / 1.702) / \sqrt{1 + {(a_{i} / 1.702)}^{2}},$

where f_i denotes the associated factor loading for item i, a_i denotes the IRT discrimination for item i, and 1.702 is a scaling constant used to convert the logistic IRT discrimination parameters into the normal ogive metric (Kamata & Bauer, Citation2008; Camilli, Citation1994; Savalei, Citation2006). To generate item thresholds, we used the logistic item discrimination (a) and difficulty (b) parameters to obtain the item intercept terms, $β_{i} = - a_{i} b_{i},$ where β_i denotes the intercept for item i, a_i denotes the item discrimination parameter, and b_i denotes the item difficulty parameter (Kamata & Bauer, Citation2008, p. 140). Next, we calculated item thresholds $τ_{i} = (- β_{i} / 1.702) \sqrt{1 - f_{i}^{2}}$ (Kamata & Bauer, Citation2008, p. 144).

5 For notational convenience, we include no smoothing (None) as a possible smoothing method.

6 This is the default behaviour of the $fa . parallel . poly ()$ function in the R psych package (Revelle, Citation2019) that was used by Debelak and Tran (Citation2013).

7 Specifically, we defined the estimated number of dimensions as the number of structured-data eigenvalues (from the smoothed tetrachoric correlation matrix) that were larger than their associated median eigenvalues from 1,000 random-data tetrachoric correlation matrices. Counting stopped as soon as the first (ordered) structured-data eigenvalue fell below its associated random-data median eigenvalue.

8 To our knowledge, we are the first investigators to implement this more stringent distinction between major and minor common factors when using the Tucker et al. (Citation1969) method for generating model approximation error. An argument to implement this feature is available in $simFA ()$ (Waller, Citation2020).

9 It merits comment that models with different design characteristics (e.g. different numbers of factors, different number of indicators per factor, etc.) often required different values of U_p and $ϵ$ to achieve RMSEA values in a specified range. Moreover, models with equal U_p values could apportion different amounts of total variance to the minor common factors. By grouping our models based on their RMSEA values—rather than the U_p and $ϵ$ values—we were able to align our model fit conditions with published rules-of-thumb (Lai & Green, Citation2016; Hu & Bentler, Citation1999; Marsh et al., Citation2004).

10 As shown in Table S13, when moving from the Perfect- to Moderate-fit models, the NPD rates increased by over 12 percentage points in one out of five conditions (i.e. when conditioning on all variables except model fit), and over 30 percentage points in one out of ten conditions. In the most extreme case, the NPD rate increased 82 percentage points (i.e. when Factors = 10, Indicators = 5, Loading = 0.4, $ϕ_{f_{i} f_{j}} = 0,$ and N = 500). In aggregate, these results point to the importance of including model approximation error in Monte Carlo studies of matrix definiteness.

11 The corSample() function in the R fungible package (Waller, Citation2020) was used to sample correlation matrices at given sample sizes from population correlation matrices. This function implements an algorithm due to Kshirsagar (Citation1959).

12 Note that CRMR is, by definition, an upper-bound to the standardized root mean squared residual (SRMR) when correlation matrices are used (Maydeu-Olivares, Citation2017).

13 A Google Scholar search on the terms “Gough adjective checklist” yielded 36,020 citations on May 14, 2020.

Assessing Dimensionality in Non-Positive Definite Tetrachoric Correlation Matrices: Does Matrix Smoothing Help?

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Assessing Dimensionality in Non-Positive Definite Tetrachoric Correlation Matrices: Does Matrix Smoothing Help?

Abstract

Article information

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature