1,079
Views
19
CrossRef citations to date
0
Altmetric
Original Articles

Structural Equation Modeling Approaches for Analyzing Partially Nested Data

, , , , &
Pages 93-118 | Published online: 07 Apr 2014
 

Abstract

Study designs involving clustering in some study arms, but not all study arms, are common in clinical treatment-outcome and educational settings. For instance, in a treatment arm, persons may be nested in therapy groups, whereas in a control arm there are no groups. Methodological approaches for handling such partially nested designs have recently been developed in a multilevel modeling framework (MLM-PN) and have proved very useful. We introduce two alternative structural equation modeling (SEM) approaches for analyzing partially nested data: a multivariate single-level SEM (SSEM-PN) and a multiple-arm multilevel SEM (MSEM-PN). We show how SSEM-PN and MSEM-PN can produce results equivalent to existing MLM-PNs and can be extended to flexibly accommodate several modeling features that are difficult or impossible to handle in MLM-PNs. For instance, using an SSEM-PN or MSEM-PN, it is possible to specify complex structural models involving cluster-level outcomes, obtain absolute model fit, decompose person-level predictor effects in the treatment arm using latent cluster means, and include traditional factors as predictors/outcomes. Importantly, implementation of such features for partially nested designs differs from that for fully nested designs. An empirical example involving a partially nested depression intervention combines several of these features in an analysis of interest for treatment-outcome studies.

Notes

Some of these simulations also compared the MLM-PN with other alternatives including (a) treatment clusters as fixed effects; (b) assigning all control persons the same subject identifier and considering them as constituting a single large cluster when fitting a standard MLM; or (c) after study completion, dividing control persons arbitrarily into multiperson “clusters” to mimic the data structure in the treatment arm and then fitting a standard MLM. The MLM-PN also performed favorably vis-à-vis these alternatives (Baldwin et al. Citation2011; Korendijk et al., 2012; Sanders, Citation2011).

We use the term “multiple-arm model” in place of the more conventional SEM term “multiple-group model” because we have already used the word “group” to refer to cluster (as in therapy group).

Equations in the text correspond with a conditional likelihood formulation (e.g., as used in Mplus). Implementation with LISREL (which uses a joint likelihood formulation) requires slightly different specification for some models; examples are given in the online Appendix.

In practice, homoscedasticity of residual variances across arm is more commonly considered for unconditional models or when the same Level 1 predictors are used in each arm.

One approach to circumvent this limitation is described shortly in this article; it involves minor recoding of the outcome variable but in exchange affords the benefits of (a) a transparent model specification in which all parameters are interpretable and (b) a likelihood identical to equivalent MSEM-PNs and MLM-PNs fitted in this article. Some software packages (e.g., Mplus but not LISREL) permit an alternative approach relying on a true multiple-arm SEM architecture, but require a trick in which variables for nonexistent persons are nonetheless included in the control arm model specification, with their variances fixed to near 0 and their effects held equal to those in the treatment arm. This approach can encounter estimation problems for some of the more complex models considered later. It also entails a less transparent model specification (i.e., half of the specified model is not interpretable). For these reasons, the latter approach was not used in the SSEM-PN. Other approaches are discussed in Widaman et al. (Citation2013) and Kim et al. (Citationin press) in different contexts (e.g., measurement invariance testing).

In the control arm, if the Level 2 variance (ψ c ) had instead been estimated, as in , Panel B, achieving homoscedastic residual variances across arm would require the constraint: θ t ϵ c .

Considerable subject-initiated missing data in a given arm, leading to low covariance coverage, can preclude calculation of the modified saturated model in Mplus (which does not happen when covariance coverage is exactly 0).

A data management step similar to that used in MLM-PN (Bauer et al., Citation2008) is required to prevent listwise deletion of cases with covariate missingness-by-design (i.e., covariate missingness arising due to the partial nesting design structure) under FIML when using exogenous predictors and a conditional likelihood. Some SEM software uses a conditional likelihood (e.g., Mplus 6.1 or later but not LISREL 8.8); general documentation regarding this data management step is at www.statmodel.com. Here, for controls, an arbitrary value that is not the missing data code (we use 0) is assigned to x 2j x 5j and wj . This insertion for x 2j x 5j does not affect results because corresponding outcomes yt 2j yt 5j are missing; this insertion for wj does not affect results because here its slope for controls is fixed to 0.

As described in the MLM-PN section (Bauer et al., Citation2008) assigning control cases an arbitrary value that is not the missing data code (e.g., 0) for missing-by-design scores on wj prevents listwise deletion of these cases when using FIML with exogenous predictors and a conditional likelihood. Also, MSEM-PN requires wj to have nonzero variance in each arm; this can be addressed in Mplus by specifying variances = nocheck or more generally by assigning at least two different arbitrary values for missing-by-design scores on wj . Similar to MLM-PN, the choice of what arbitrary values to assign does not affect results (as the slope of wj is fixed to 0 for controls; see also Bauer et al., Citation2008).

MLM-PNs were rerun using alternative methods for computing degrees of freedom for t tests of fixed effects (the Kenward-Rogers method from Baldwin et al. [2011] and Bauer et al., [2008], which is available in only some multilevel modeling packages, and a Containment method). Standard errors were the same to three decimal places. In contrast, SEM packages typically provide only z tests for fixed effects. Baldwin et al. showed that, unless there were very few clusters in the treatment arm (<8), alternative df computation methods provided equally good Type I error rates in MLM-PN. Our empirical example and simulations had more clusters than this in the treatment arm.

In this MLM trick, multivariate outcomes are concatenated vertically in the data set, and one toggle indicator variable is constructed per outcome (e.g., the first toggle indicator is 1 if yij refers to the first outcome, 0 otherwise). A univariate MLM is fit, but no overall intercept is estimated; rather, main effects of the indicator variables refer to outcome-specific intercepts. Product terms of the indicator variables with each predictor are entered to allow the predictors to have outcome-specific effects.

Strategy B also applies to MSEM-PN; it is less commonly implemented, so is omitted to save space.

Strategy A is not estimable in the SSEM-PN due to the ipsativity described in Curran et al. (Citation2012).

If a predictor (i.e., column in a wide-format data set) is an exact linear function of other predictors (columns) the covariance matrix of predictors has a zero determinant. However, in the partial nesting data structure, is not the same linear function of x 1j x 5j for all rows of the data set. is calculated from the present data only and so will be a different function depending on how many people are in a given cluster. In the control arm, only x 1j is used in computing . As mentioned previously, in the control arm, y 2j y 5j and x 2j x 5j are missing-by-design and the latter are assigned an arbitrary numeric value (not the missing code) to retain the control persons in the conditional likelihood computation (see Footnote 8 and see Bauer et al., Citation2008).

This predictor was divided by 10 to render its scale more comparable to scales of the other variables.

In multilevel (or partially nested) settings, confidence intervals for indirect effects can be obtained using the Monte Carlo method described in Preacher and Selig (Citation2012) and MacKinnon, Lockwood, and Williams (Citation2004).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.