89
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Correcting for Sampling Error in between-Cluster Effects: An Empirical Bayes Cluster-Mean Approach with Finite Population CorrectionsOpen Materials

ORCID Icon, ORCID Icon & ORCID Icon
Pages 584-598 | Published online: 13 Feb 2024
 

Abstract

With clustered data, such as where students are nested within schools or employees are nested within organizations, it is often of interest to estimate and compare associations among variables separately for each level. While researchers routinely estimate between-cluster effects using the sample cluster means of a predictor, previous research has shown that such practice leads to biased estimates of coefficients at the between level, and recent research has recommended the use of latent cluster means with the multilevel structural equation modeling framework. However, the latent cluster mean approach may not always be the best choice as it (a) relies on the assumption that the population cluster sizes are close to infinite, (b) requires a relatively large number of clusters, and (c) is currently only implemented in specialized software such as Mplus. In this paper, we show how using empirical Bayes estimates of the cluster means can also lead to consistent estimates of between-level coefficients, and illustrate how the empirical Bayes estimate can incorporate finite population corrections when information on population cluster sizes is available. Through a series of Monte Carlo simulation studies, we show that the empirical Bayes cluster-mean approach performs similarly to the latent cluster mean approach for estimating the between-cluster coefficients in most conditions when the infinite-population assumption holds, and applying the finite population correction provides reasonable point and interval estimates when the population is finite. The performance of EBM can be further improved with restricted maximum likelihood estimation and likelihood-based confidence intervals. We also provide an R function that implements the empirical Bayes cluster-mean approach, and illustrate it using data from the classic High School and Beyond Study.

Article information

Conflict of interest disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.

Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was supported by Grant 2141790 from the National Science Foundation.

Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors’ institutions or the National Science Foundation is not intended and should not be inferred.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Open Scholarship

This article has earned the Center for Open Science badges for Open Materials. The materials are openly accessible at https://github.com/marklhc/ebm-supp.

Notes

1 The R function and the supplemental results can be found at https://github.com/marklhc/ebm-supp.

2 Another popular R package for SEM, lavaan, currently only supports models without random slopes.

3 Cheung (Citation2013) discussed ways to implement REML in the SEM framework using a transformation matrix or a modified fitting function.

4 Essentially the same procedure was proposed by Croon and van Veldhoven (Citation2007), but in the context of predicting a between-level outcome.

5 For example, a quick survey of recent MLM textbooks used in social and behavioral sciences (Heck & Thomas, Citation2020; Hox et al., Citation2018; Luke, Citation2020; Snijders & Bosker, Citation2012) found only discussions of CM, but not EBM.

6 However, this does not control for different software using different numerical algorithms and convergence criteria to find ML solutions.

7 For example, when τX2 = 0.05 and n¯=25,λXj = 0.56, so the expected bias is 0.56.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.