Abstract
With clustered data, such as where students are nested within schools or employees are nested within organizations, it is often of interest to estimate and compare associations among variables separately for each level. While researchers routinely estimate between-cluster effects using the sample cluster means of a predictor, previous research has shown that such practice leads to biased estimates of coefficients at the between level, and recent research has recommended the use of latent cluster means with the multilevel structural equation modeling framework. However, the latent cluster mean approach may not always be the best choice as it (a) relies on the assumption that the population cluster sizes are close to infinite, (b) requires a relatively large number of clusters, and (c) is currently only implemented in specialized software such as Mplus. In this paper, we show how using empirical Bayes estimates of the cluster means can also lead to consistent estimates of between-level coefficients, and illustrate how the empirical Bayes estimate can incorporate finite population corrections when information on population cluster sizes is available. Through a series of Monte Carlo simulation studies, we show that the empirical Bayes cluster-mean approach performs similarly to the latent cluster mean approach for estimating the between-cluster coefficients in most conditions when the infinite-population assumption holds, and applying the finite population correction provides reasonable point and interval estimates when the population is finite. The performance of EBM can be further improved with restricted maximum likelihood estimation and likelihood-based confidence intervals. We also provide an R function that implements the empirical Bayes cluster-mean approach, and illustrate it using data from the classic High School and Beyond Study.
Article information
Conflict of interest disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.
Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.
Funding: This work was supported by Grant 2141790 from the National Science Foundation.
Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Acknowledgments: The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors’ institutions or the National Science Foundation is not intended and should not be inferred.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Open Scholarship
This article has earned the Center for Open Science badges for Open Materials. The materials are openly accessible at https://github.com/marklhc/ebm-supp.
Notes
1 The R function and the supplemental results can be found at https://github.com/marklhc/ebm-supp.
2 Another popular R package for SEM, lavaan, currently only supports models without random slopes.
3 Cheung (Citation2013) discussed ways to implement REML in the SEM framework using a transformation matrix or a modified fitting function.
4 Essentially the same procedure was proposed by Croon and van Veldhoven (Citation2007), but in the context of predicting a between-level outcome.
5 For example, a quick survey of recent MLM textbooks used in social and behavioral sciences (Heck & Thomas, Citation2020; Hox et al., Citation2018; Luke, Citation2020; Snijders & Bosker, Citation2012) found only discussions of CM, but not EBM.
6 However, this does not control for different software using different numerical algorithms and convergence criteria to find ML solutions.
7 For example, when = 0.05 and
= 0.56, so the expected bias is 0.56.