550
Views
9
CrossRef citations to date
0
Altmetric
Articles

Problems with Rationales for Parceling that Fail to Consider Parcel-Allocation Variability

Pages 264-287 | Published online: 12 Feb 2019
 

Abstract

In structural equation modeling applications, parcels—averages or sums of subsets of item scores—are often used as indicators of latent constructs. Parcel-allocation variability (PAV) is variability in results that arises within sample across alternative item-to-parcel allocations. PAV can manifest in all results of a parcel-level model (e.g., model fit, parameter estimates, standard errors, and inferential decisions). It is a source of uncertainty in parcel-level model results that can be investigated, reported, and accounted for. Failing to do so raises representativeness and replicability concerns. However, in recent methodological literature (Cole, Perkins, & Zelkowitz, Citation2016; Little, Rhemtulla, Gibson, & Shoemann, Citation2013; Marsh, Ludtke, Nagengast, Morin, & von Davier, Citation2013; Rhemtulla, Citation2016) parceling has been justified and recommended in several situations without quantifying or accounting for PAV. In this article, we explain and demonstrate problems with these rationales. Overall, we find that: (1) using a purposive parceling algorithm for a multidimensional construct does not avoid PAV; (2) passing a test of unidimensionality of the item-level model need not avoid PAV; and (3) a desire to improve power for detecting structural misspecification does not warrant parceling without addressing PAV; we show how to simultaneously avoid PAV and obtain even higher power by comparing item-level models differing in structural constraints. Implications for practice are discussed.

Notes

1 Additionally, Sterba and Rights (Citation2017) showed that even in the absence of both measurement model error and sampling error PAV in results can arise simply when there are unequal item loadings on a factor.

2 For discussion about choosing M see Sterba and Rights (Citation2016).

3 One may wonder why researchers would want or need to parcel if they can estimate an item-level model, but we do not take up this matter here (see Bandalos, Citation2008, and see Situation 3 and its response below) because our purpose here is to evaluate the implications of Situation 2 for PAV.

4 The test used will be described shortly.

5 The pooled PAV distribution for samples that passed the item-level test in a given cell of the simulation is computed in the following manner. Take each within-sample across-allocation distribution of parcel-solution RMSEAs and subtract the sample mean of the parcel-solution RMSEAs. Then, pool results across those samples, yielding a distribution of sample-mean-centered parcel-solution RMSEAs which is essentially a pooled-within distribution of parcel-solution RMSEAs. Then, add back in the cell-mean of the parcel-solution RMSEAs.

6 As mentioned earlier, Marsh et al. (Citation2013) preferred a final step to make their testing procedure even more stringent: inspecting structural covariances to see if they meaningfully differ between the EFA and CFA solutions. But they did not supply an objective way to operationalize this aspect of the procedure, and so it was not implemented here.

7 Sterba and Preacher (in prep) discuss how and when to make such adjustments to the χ2 test of absolute fit.

8 The same pattern of results held for Rhemtulla’s generating parameters from her Model 3b. Here, we used a similar set of generating parameters: standardized latent regression paths were 0.4 except for the direct effect of the x-factor on the y-factor, which was 0.35; standardized item loadings were 0.4. Note that we used these generating parameters because the even larger effect sizes in the original source led to a ceiling on power at 1.0 when we implemented our alternative testing approach (described subsequently) but not when implementing the originally reported approach, which made our (subsequent) comparisons less clear visually. Residual variances were chosen to render paths standardized; note that in the original source the residual variance for the y-factor should be 0.3615 rather than 0.46.

9 The same logic applies to, for instance, RMSEA, which is a function of the χ2, but we do not repeat our demonstration for multiple fit indices here.

10 Note that what we term empirical power for the χ2 test of absolute fit differs from what Yuan, Zhang, and Zhao (Citation2017) term Monte Carlo power. As explained and illustrated in detail in Sterba and Preacher (in prep), Yuan et al. (Citation2017) use both an empirically generated null distribution and an empirically generated alternative distribution. Power computed using their approach will not mirror the actual power that manifests in real-world practice with low N and high p after researchers collect data and fit their model using standard SEM software. (Standard SEM software by default uses a theoretical null distribution, not an empirically generated null distribution, when performing a χ2 test of absolute fit.)

11 Although our focus here is not on PAV in structural parameter estimates, note that in this simulation the average across-allocation within-sample range for the point estimate of the direct effect of the x-factor on the y-factor (c-path) was {0.28–0.43} and the average across-allocation within-sample range for the standard error of the direct effect (c-path) was {0.11–0.15}.

12 Such a choice would be fraught. Although the parcel-solution has a theoretical power advantage using the global test of absolute fit for detecting structural misspecifications, it has a disadvantage for detecting measurement model misspecification (e.g., Bandalos, Citation2002; Hall et al., Citation1999; Meade & Kroustalis, Citation2006; Rhemtulla, Citation2016), and in the likely context where at least a little of both kinds of misspecification co-occur, such advantages and disadvantages could cancel out. Furthermore, although the item-solution has an empirical power advantage using the global test of absolute fit for detecting structural misspecification in the common circumstances of moderate N and large p, it also has elevated type I error under these circumstances [unless adjustments to the χ2 test of absolute fit are made to allow empirical type I error and power to more closely conform with theoretical expectation—see Sterba and Preacher (in prep) for procedures].

13 Regarding the closer correspondence between theoretical and empirical power for detecting the structural misspecification using a χ2difference test than using the χ2test of absolute fit in , it can be shown that the empirical inflation of the χ2statistic under low N and high p is smaller for the χ2difference test. Note that the empirical power calculation also allows the researcher to quantify and report PAV, as in .

14 From Steiger et al. (Citation1985) Theorem 1, the chi-square test of absolute fit of Model A uses test statistic nF̂(A) ∼ noncentral χ2 with df = vA and noncentrality δA whereas the chi-square difference test for Model A versus Model B uses test statistic [(nF̂(A)nF̂(B)) ∼ noncentral χ2 with df = (vAvB) and noncentrality (δAδB). Here, F̂(A) and F̂(B) are minimized discrepancies for a sample size of n for Models A and B, respectively. Noncentralities δB and δA are population “badness of fit” quantities. If both Model B and A are incorrect, the noncentralities for these tests are different (i.e., δA vs. (δAδB)). However, in the case where Model B is the generating model (as also assumed in Rhemtulla (Citation2016), and in widespread parceling practice when researchers routinely assume no measurement model error and calculate power for a particular parametric structural misspecification), δB=0 and δA>0. In this case, the chi-square test of absolute fit of Model A uses test statistic nF̂(A) with df = vA and noncentrality δA whereas the chi-square difference test for Model A versus B uses test statistic (nF̂(A)nF̂(B)) with df = (vAvB) and noncentrality δA. Although the latter two tests use the same noncentrality, that does not imply the same power (see and ) because although in either test the centers of the null and alternative distributions differ by δA, for a given test the pair of (null and alternative) distributions is pushed left or right depending on the values of vB and vA (see ).

15 For reference purposes, note that 40% of samples from the Situation 3 simulation similarly had PAV in the result of the significance test for the indirect effect. Significance of the indirect effect was determined by creating a 95% Monte Carlo confidence interval for the indirect effect (using Preacher & Selig, Citation2012 procedures), for each allocation, and then checking to see if that 95% CI includes the null hypothesized value of 0.

16 Kam and Meyer (2015) reported the within-sample across-allocations average of the RMSEA to be .06, but did not report the percentage of allocations in which the test of close fit was rejected. Note that even if the standard deviation of the PAV distribution appears small, when the PAV distribution is centered so near to the decision threshold of close versus not close fit, there can be practically meaningful changes in results across allocations in the form of fit flipping between close and not close across allocations (as was shown in ).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.