326
Views
2
CrossRef citations to date
0
Altmetric
Articles

Assessing Dimensionality in Non-Positive Definite Tetrachoric Correlation Matrices: Does Matrix Smoothing Help?

&
Pages 385-407 | Published online: 30 Dec 2020
 

Abstract

We performed two simulation studies that investigated dimensionality recovery in NPD tetrachoric correlation matrices using parallel analysis. In each study, the NPD matrices were rehabilitated by three smoothing algorithms. In Study 1, we replicated the work by Debelak and Tran on the assessment of dimensionality in one- or two-dimensional common factor models. In Study 2, we extended the Debelak and Tran design in three important ways. Specifically, we investigated: (a) a wider range of factors; (b) models with varying amounts of model error; and (c) models generated from more realistic population item parameters. Our results indicated that matrix smoothing of NPD tetrachoric correlation matrices improves the performance of parallel analysis with binary data. However, these improvements were modest and often of trivial size. To demonstrate the effect of matrix smoothing on an empirical data set, we applied parallel analysis and factor analysis to Adjective Checklist data from the California Twin Registry.

Article information

Conflict of interest disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.

Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was not supported by any grant.

Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The authors would like to thank the anonymous reviewers for their comments on prior versions of this manuscript, as well as Rudolf Debelak and Ulrich Tran for providing R code and additional details that were necessary for the replication of Debelak and Tran (Citation2013). The authors also acknowledge the Minnesota Supercomputing Institute (MSI; http://www.msi.umn.edu) and Liberal Arts Technologies and Innovation Services (LATIS; https://cla.umn.edu/latis) at the University of Minnesota for providing resources that contributed to the research results reported within this paper. The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors' institutions is not intended and should not be inferred.

Notes

1 We use the term non-positive definite to refer only to matrices with both positive and negative eigenvalues. The set of non-positive definite matrices does not include negative semidefinite matrices (having all non-positive eigenvalues).

2 Wothke (Citation1993) provides a clear explanation of the underlying geometry for why “correlation” matrices can have negative eigenvalues.

3 Tables and figures that appear in the online supplement are indicated by the “S” prefix (e.g., Table S1, Figure S2).

4 Because simFA() requires item parameters to be specified in the factor loading and item threshold metrics, we calculated factor loadings from the item discrimination parameters via fi=(ai/1.702)/1+(ai/1.702)2,

where fi denotes the associated factor loading for item i, ai denotes the IRT discrimination for item i, and 1.702 is a scaling constant used to convert the logistic IRT discrimination parameters into the normal ogive metric (Kamata & Bauer, Citation2008; Camilli, Citation1994; Savalei, Citation2006). To generate item thresholds, we used the logistic item discrimination (a) and difficulty (b) parameters to obtain the item intercept terms, βi=aibi, where βi denotes the intercept for item i, ai denotes the item discrimination parameter, and bi denotes the item difficulty parameter (Kamata & Bauer, Citation2008, p. 140). Next, we calculated item thresholds τi=(βi/1.702)1fi2 (Kamata & Bauer, Citation2008, p. 144).

5 For notational convenience, we include no smoothing (None) as a possible smoothing method.

6 This is the default behaviour of the fa.parallel.poly() function in the R psych package (Revelle, Citation2019) that was used by Debelak and Tran (Citation2013).

7 Specifically, we defined the estimated number of dimensions as the number of structured-data eigenvalues (from the smoothed tetrachoric correlation matrix) that were larger than their associated median eigenvalues from 1,000 random-data tetrachoric correlation matrices. Counting stopped as soon as the first (ordered) structured-data eigenvalue fell below its associated random-data median eigenvalue.

8 To our knowledge, we are the first investigators to implement this more stringent distinction between major and minor common factors when using the Tucker et al. (Citation1969) method for generating model approximation error. An argument to implement this feature is available in simFA() (Waller, Citation2020).

9 It merits comment that models with different design characteristics (e.g. different numbers of factors, different number of indicators per factor, etc.) often required different values of Up and ϵ to achieve RMSEA values in a specified range. Moreover, models with equal Up values could apportion different amounts of total variance to the minor common factors. By grouping our models based on their RMSEA values—rather than the Up and ϵ values—we were able to align our model fit conditions with published rules-of-thumb (Lai & Green, Citation2016; Hu & Bentler, Citation1999; Marsh et al., Citation2004).

10 As shown in Table S13, when moving from the Perfect- to Moderate-fit models, the NPD rates increased by over 12 percentage points in one out of five conditions (i.e. when conditioning on all variables except model fit), and over 30 percentage points in one out of ten conditions. In the most extreme case, the NPD rate increased 82 percentage points (i.e. when Factors = 10, Indicators = 5, Loading = 0.4, ϕfifj=0, and N = 500). In aggregate, these results point to the importance of including model approximation error in Monte Carlo studies of matrix definiteness.

11 The corSample() function in the R fungible package (Waller, Citation2020) was used to sample correlation matrices at given sample sizes from population correlation matrices. This function implements an algorithm due to Kshirsagar (Citation1959).

12 Note that CRMR is, by definition, an upper-bound to the standardized root mean squared residual (SRMR) when correlation matrices are used (Maydeu-Olivares, Citation2017).

13 A Google Scholar search on the terms “Gough adjective checklist” yielded 36,020 citations on May 14, 2020.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.