2,418
Views
7
CrossRef citations to date
0
Altmetric
Methodological Studies

Multilevel Design Parameters to Plan Cluster-Randomized Intervention Studies on Student Achievement in Elementary and Secondary School

, , &
Pages 172-206 | Received 03 Mar 2020, Accepted 10 Sep 2020, Published online: 22 Jan 2021

References

  • American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35(6), 33–40. https://doi.org/10.3102/0013189X035006033
  • American Psychological Association (Ed.). (2019). Publication manual of the American Psychological Association (7th ed.). American Psychological Association.
  • Aßmann, C., Steinhauer, H. W., Kiesl, H., Koch, S., Schönberger, B., Müller-Kuller, A., Rohwer, G., Rässler, S., & Blossfeld, H.-P. (2011). Sampling designs of the National Educational Panel Study: Challenges and solutions. Zeitschrift Für Erziehungswissenschaft, 14(S2), 51–65. https://doi.org/10.1007/s11618-011-0181-8
  • Baumert, J., Köller, O., Lehrke, M., & Brockmann, J. (2000). Anlage und Durchführung der Dritten Internationalen Mathematik- und Naturwissenschaftsstudie zur Sekundarstufe II (TIMSS/III)—Technische Grundlagen [Design and implementation of the Third Trends in International Mathematics and Science Study (TIMSS/III)—technical information]. In J. Baumert, W. Bos, & R. Lehmann (Eds.), TIMSS/III. Dritte Internationale Mathematik- und Naturwissenschaftsstudie. Mathematische und naturwissenschaftliche Bildung am Ende der Schullaufbahn: Vol. 1: Mathematische und naturwissenschaftliche Grundbildung am Ende der Pflichschulzeit (pp. 31–84). Leske + Budrich.
  • Baumert, J., Trautwein, U., & Artelt, C. (2003). Schulumwelten—Institutionelle Bedingungen des Lehrens und Lernens [School contexts—institutional conditions for teaching and learning]. In Deutsches PISA-Konsortium (Ed.), PISA 2000. Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland (pp. 261–331). Leske + Budrich. https://doi.org/10.1007/978-3-322-97590-4_11
  • Beck, B., Bundt, S., & Gomolka, J. (2008). Ziele und Anlage der DESI-Studie [Objectives and design of the DESI study]. In DESI-Konsortium (Ed.), Unterricht und Kompetenzerwerb in Deutsch und Englisch. Ergebnisse der DESI-Studie (pp. 11–25). Beltz.
  • Bloom, H. S. (1995). Minimum detectable effects: A simple way to report the statistical power of experimental designs. Evaluation Review, 19(5), 547–556. https://doi.org/10.1177/0193841X9501900504
  • Bloom, H. S. (2005). Randomizing groups to evaluate place-based programs. In H. S. Bloom (Ed.), Learning more from social experiments. Evolving analytic approaches (pp. 115–172). Russell Sage Foundation.
  • Bloom, H. S. (2006). The core analytics of randomized experiments for social research. MDRC Working Papers on Research Methodology. http://www.mdrc.org/sites/default/files/full_533.pdf
  • Bloom, H. S., Bos, J. M., & Lee, S.-W. (1999). Using cluster random assignment to measure program impacts. Statistical implications for the evaluation of education programs. Evaluation Review, 23(4), 445–469. https://doi.org/10.1177/0193841X9902300405
  • Bloom, H. S., Raudenbush, S. W., Weiss, M. J., & Porter, K. (2017). Using multisite experiments to study cross-site variation in treatment effects: A hybrid approach with fixed intercepts and a random treatment coefficient. Journal of Research on Educational Effectiveness, 10(4), 817–842. https://doi.org/10.1080/19345747.2016.1264518
  • Bloom, H. S., Richburg-Hayes, L., & Black, A. R. (2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational Evaluation and Policy Analysis, 29(1), 30–59. https://doi.org/10.3102/0162373707299550
  • Bloom, H. S., & Spybrook, J. (2017). Assessing the precision of multisite trials for estimating the parameters of a cross-site population distribution of program effects. Journal of Research on Educational Effectiveness, 10(4), 877–902. https://doi.org/10.1080/19345747.2016.1271069
  • Bloom, H. S., Zhu, P., Jacob, R., Raudenbush, S. W., Martinez, A., & Lin, F. (2008). Empirical issues in the design of group-randomized studies to measure the effects of interventions for children. MDRC Working Papers on Research Methodology. https://www.mdrc.org/sites/default/files/full_85.pdf
  • Blossfeld, H.-P., Roßbach, H.-G., & von Maurice, J. (Eds.). (2011). Education as a lifelong process: The German National Educational Panel Study (NEPS). VS Verlag für Sozialwissenschaften.
  • Böhme, K., & Weirich, S. (2012). Der Ländervergleich im Fach Deutsch [National Assessment Study in German]. In P. Stanat, H. A. Pant, K. Böhme, & D. Richter (Eds.), Kompetenzen von Schülerinnen und Schülern am Ende der vierten Jahrgangsstufe in den Fächern Deutsch und Mathematik. Ergebnisse des IQB-Ländervergleichs 2011 (pp. 103–116). Waxmann.
  • Boruch, R. F., & Foley, E. (2000). The honestly experimental society: Sites and other entities as the units of allocation and analysis in randomized trials. In L. Bickman (Ed.), Validity and social experimentation: Donald campbell’s legacy (pp. 193–239). SAGE.
  • Brandon, P. R., Harrison, G. M., & Lawton, B. E. (2013). SAS code for calculating intraclass correlation coefficients and effect size benchmarks for site-randomized education experiments. American Journal of Evaluation, 34(1), 85–90. https://doi.org/10.1177/1098214012466453
  • Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students’ achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452–478. https://doi.org/10.1080/19345747.2017.1375584
  • Bulus, M., Dong, N., Kelcey, B., & Spybrook, J. (2019). PowerUpR: Power analysis tools for multilevel randomized experiments. R package version 1.0.4. https://CRAN.R-project.org/package=PowerUpR
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Lawrence Erlbaum Associates.
  • Cook, T. D. (2005). Emergent principles for the design, implementation, and analysis of cluster-based experiments in social science. Annals of the American Academy of Political and Social Science, 599(1), 176–198. https://doi.org/10.1177/0002716205275738
  • DESI-Konsortium (Ed.). (2008). Unterricht und Kompetenzerwerb in Deutsch und Englisch: Ergebnisse der DESI-Studie [Teaching and acquisition of competencies in German and English as a foreign language: Results from the DESI study]. Beltz.
  • Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24–67. https://doi.org/10.1080/19345747.2012.673143
  • Donner, A., & Klar, N. (2000). Design and analysis of cluster randomization trials in health research. Hodder Education.
  • Donner, A., & Koval, J. J. (1980). The large sample variance of an intraclass correlation. Biometrika, 67(3), 719–722. https://doi.org/10.1093/biomet/67.3.719
  • Ganzeboom, H. B. G., & Treiman, D. J. (1996). Internationally comparable measures of occupational status for the 1988 international standard classification of occupations. Social Science Research, 25(3), 201–239. https://doi.org/10.1006/ssre.1996.0010
  • Gersten, R., Rolfhus, E., Clarke, B., Decker, L. E., Wilkins, C., & Dimino, J. (2015). Intervention for first graders with limited number knowledge: Large-scale replication of a randomized controlled trial. American Educational Research Journal, 52(3), 516–546. https://doi.org/10.3102/0002831214565787
  • Grund, S., Robitzsch, A., & Lüdtke, O. (2019). Mitml: Tools for multiple imputation in multilevel modeling. R package version 0.3-7. https://CRAN.R-project.org/package=mitml
  • Haag, N., & Roppelt, A. (2012). Der Ländervergleich im Fach Mathematik [National Assessment Study in mathematics]. In P. Stanat, H. A. Pant, K. Böhme, & D. Richter (Eds.), Kompetenzen von Schülerinnen und Schülern am Ender der vierten Jahrgangsstufe in den Fächern Deutsch und Mathematik. Ergebnisse des IQB-Ländervergleichs 2011 (pp. 117–127). Waxmann.
  • Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 621–638. https://doi.org/10.1080/10705511.2017.1402334
  • Hedberg, E. C., Santana, R., & Hedges, L. V. (2004). The variance structure of academic achievement in America. Annual meeting of the American Educational Research Association, San Diego, CA.
  • Hedges, L. V., & Hedberg, E. C. (2007a). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60–87. https://doi.org/10.3102/0162373707299706
  • Hedges, L. V., & Hedberg, E. C. (2007b). Intraclass correlations for planning group randomized experiments in rural education. Journal of Research in Rural Education, 22(10), 1–15. http://jrre.vmhost.psu.edu/wp-content/uploads/2014/02/22-10.pdf
  • Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome correlations for planning two- and three-level cluster-randomized experiments in education. Evaluation Review, 37(6), 445–489. https://doi.org/10.1177/0193841X14529126
  • Hedges, L. V., Hedberg, E. C., & Kuyper, A. M. (2012). The variance of intraclass correlations in three- and four-level models. Educational and Psychological Measurement, 72(6), 893–909. https://doi.org/10.1177/0013164412445193
  • Hedges, L. V., Rhoads, C. (2010). Statistical power analysis in education research. National Center for Special Education Research. https://ies.ed.gov/ncser/pubs/20103006/pdf/20103006.pdf
  • Hedges, L. V., & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychological Methods, 3(4), 486–504. https://doi.org/10.1037/1082-989X.3.4.486
  • Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177. https://doi.org/10.1111/j.1750-8606.2008.00061.x
  • Institute of Education Sciences & National Science Foundation. (2013). Common guidelines for education research and development. https://www.nsf.gov/pubs/2013/nsf13126/nsf13126.pdf
  • Jacob, R. T., Zhu, P., & Bloom, H. S. (2010). New empirical evidence for the design of group randomized trials in education. Journal of Research on Educational Effectiveness, 3(2), 157–198. https://doi.org/10.1080/19345741003592428
  • Kelcey, B., Shen, Z., & Spybrook, J. (2016). Intraclass correlation coefficients for designing cluster-randomized trials in Sub-Saharan Africa education. Evaluation Review, 40(6), 500–525. https://doi.org/10.1177/0193841X16660246
  • Knigge, M., & Köller, O. (2010). Effekte der sozialen Zusammensetzung der Schülerschaft [Impact of the social classroom composition of schools]. In O. Köller, M. Knigge, & B. Tesch (Eds.), Sprachliche Kompetenzen im Ländervergleich (pp. 227–244). Waxmann.
  • Konstantopoulos, S. (2008a). The power of the test for treatment effects in three-level cluster randomized designs. Journal of Research on Educational Effectiveness, 1(1), 66–88. https://doi.org/10.1080/19345740701692522
  • Konstantopoulos, S. (2008b). The power of the test for treatment effects in three-level block randomized designs. Journal of Research on Educational Effectiveness, 1(4), 265–288. https://doi.org/10.1080/19345740802328216
  • Konstantopoulos, S. (2012). The impact of covariates on statistical power in cluster randomized designs: Which level matters more? Multivariate Behavioral Research, 47(3), 392–420. https://doi.org/10.1080/00273171.2012.673898
  • Kultusministerkonferenz. (2015). Gesamtstrategie der Kultusministerkonferenz zum Bildungsmonitoring. https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2015/2015_06_11-Gesamtstrategie-Bildungsmonitoring.pdf
  • LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11(4), 815–852. https://doi.org/10.1177/1094428106296642
  • Lehmann, R., & Lenkeit, J. (2008). ELEMENT. Erhebung zum Lese- und Mathematikverständnis. Entwicklungen in den Jahrgangsstufen 4 bis 6 in Berlin. Abschlussbericht über die Untersuchungen 2003, 2004 und 2005 an Berliner Grundschulen und grundständigen Gymnasien [ELEMENT: Study of reading and mathematics literacy. Development from grades 4 to 6 in Berlin. Final research report on the 2003, 2004, and 2005 assessments at primary schools and undergraduate academic tracks in Berlin]. Humboldt-Universität zu Berlin. https://www.researchgate.net/profile/Jenny_Lenkeit/publication/273380369_ELEMENT_Erhebung_zum_Lese-_und_Mathematik-verstandnis_-_Entwicklungen_in_den_Jahrgangsstufen_4_bis_6_in_Berlin_Abschlussbericht_uber_die_Untersuchungen_2003_2004_und_2005_an_Berliner_Grundschulen_und_/links/553f61600cf23e796bfb38c2.pdf?origin=publication_detail
  • Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., & Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. National Center for Special Education Research. http://eric.ed.gov/?id=ED537446
  • Lüdtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13(3), 203–229. https://doi.org/10.1037/a0012869
  • Martin, M. O., Foy, P., Mullis, I. V. S., & O’Dwyer, L. M. (2013). Effective schools in reading, mathematics, and science at the fourth grade. In M. O. Martin & I. V. S. Mullis (Eds.), TIMSS and PIRLS 2011: Relationships among reading, mathematics, and science achievement at the fourth grade—implications for early learning (pp. 109–179). TIMSS & PIRLS International Study Center, Boston College. https://timssandpirls.bc.edu/timsspirls2011/downloads/TP11_Relationship_Report.pdf
  • Martin, M. O., Mullis, I. V. S., Gregory, K. D., Hoyle, C., & Shen, C. (2000). Effective schools in science and mathematics: IEA’s Third International Mathematics and Science Study. International Study Center, Boston College. https://timssandpirls.bc.edu/timss1995i/TIMSSPDF/T95_EffSchool.pdf
  • Murray, D. M. (1998). Design and analysis of group-randomized trials. Oxford University Press.
  • Muthén, L. K., & Muthén, B. O. (2017). Mplus user’s guide (8th ed.). Muthén & Muthén.
  • National Research Council (Ed.). (2011). Assessing 21st century skills: Summary of a workshop. National Academies Press. https://doi.org/10.17226/13215
  • Organisation for Economic Co-operation and Development. (2007). Evidence in education: Linking research and policy. OECD Publishing. https://doi.org/10.1787/9789264033672-en
  • Organisation for Economic Co-operation and Development. (2017). Social and emotional skills. Well-being, connectedness and success. OECD Publishing. http://www.oecd.org/education/school/UPDATED%20Social%20and%20Emotional%20Skills%20-%20Well-being,%20connectedness%20and%20success.pdf%20(website).pdf
  • Organisation for Economic Co-operation and Development. (2018). The future of education and skills. OECD Publishing. https://www.oecd.org/education/2030-project/about/documents/E2030%20Position%20Paper%20(05.04.2018).pdf
  • PISA-Konsortium Deutschland (Ed.). (2006). PISA 2003. Untersuchungen zur Kompetenzentwicklung im Verlauf eines Schuljahres [PISA 2003. Investigating competence development throughout one school year]. Waxmann.
  • Prenzel, M., Carstensen, C. H., Schöps, K., & Maurischat, C. (2006). Die Anlage des Längsschnitts bei PISA 2003 [The longitudinal design of PISA 2003]. In PISA-Konsortium Deutschland (Ed.), PISA 2003. Untersuchungen zur Kompetenzentwicklung im Verlauf eines Schuljahres (pp. 29–62). Waxmann.
  • R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org
  • Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2(2), 173–185. https://doi.org/10.1037/1082-989X.2.2.173
  • Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5(2), 199–213. https://doi.org/10.1037/1082-989X.5.2.199
  • Raudenbush, S. W., Martínez, A., & Spybrook, J. (2007). Strategies for improving precision in group-randomized experiments. Educational Evaluation and Policy Analysis, 29(1), 5–29. https://doi.org/10.3102/0162373707299460
  • Robitzsch, A., Grund, S., & Henke, T. (2018). miceadds: Some additional multiple imputation functions, especially for mice. R package version 2.15-6. https://CRAN.R-project.org/package=miceadds
  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.
  • Salchegger, S. (2016). Selective school systems and academic self-concept: How explicit and implicit school-level tracking relate to the big-fish-little-pond effect across cultures. Journal of Educational Psychology, 108(3), 405–423. https://doi.org/10.1037/edu0000063
  • Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62–87. https://doi.org/10.3102/1076998607302714
  • Schochet, P. Z., Puma, M., & Deke, J. (2014). Understanding variation in treatment effects in education impact evaluations: An overview of quantitative methods. Institute of Education Sciences (IES). https://ies.ed.gov/ncee/pubs/20144017/pdf/20144017.pdf
  • Senkbeil, M. (2006). Die Bedeutung schulischer Faktoren für die Kompetenzentwicklung in Mathematik und in den Naturwissenschaften [The relevance of school context factors for competence development in mathematics and science]. In PISA-Konsortium Deutschland (Ed.), PISA 2003. Untersuchungen zur Kompetenzentwicklung im Verlauf eines Schuljahres (pp. 277–308). Waxmann.
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company.
  • Slavin, R. E. (2002). Evidence-based education policies: Transforming educational practice and research. Educational Researcher, 31(7), 15–21. https://doi.org/10.3102/0013189X031007015
  • Spybrook, J. (2013). Introduction to special issue on design parameters for cluster randomized trials in education. Evaluation Review, 37(6), 435–444. https://doi.org/10.1177/0193841X14527758
  • Spybrook, J., & Kelcey, B. (2016). Introduction to three special issues on design parameter values for planning cluster randomized trials in the social sciences. Evaluation Review, 40(6), 491–499. https://doi.org/10.1177/0193841X16685646
  • Spybrook, J., & Raudenbush, S. W. (2009). An examination of the precision and technical accuracy of the first wave of group-randomized trials funded by the institute of education sciences. Educational Evaluation and Policy Analysis, 31(3), 298–318. https://doi.org/10.3102/0162373709339524
  • Spybrook, J., Shi, R., & Kelcey, B. (2016). Progress in the past decade: An examination of the precision of cluster randomized trials funded by the U.S. Institute of Education Sciences. International Journal of Research & Method in Education, 39(3), 255–267. https://doi.org/10.1080/1743727X.2016.1150454
  • Spybrook, J., Westine, C. D., & Taylor, J. A. (2016). Design parameters for impact research in science education: A multistate analysis. AERA Open, 2(1), 1–15. https://doi.org/10.1177/2332858415625975
  • van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
  • Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03
  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
  • Weiss, M. J., Bloom, H. S., & Brock, T. (2014). A conceptual framework for studying the sources of variation in program effects. Journal of Policy Analysis and Management, 33(3), 778–808. https://doi.org/10.1002/pam.21760
  • Weiss, M. J., Bloom, H. S., Verbitsky-Savitz, N., Gupta, H., Vigil, A. E., & Cullinan, D. N. (2017). How much do the effects of education and training programs vary across sites? Evidence from past multisite randomized trials. Journal of Research on Educational Effectiveness, 10(4), 843–876. https://doi.org/10.1080/19345747.2017.1300719
  • Wenger, M., Lüdtke, O., & Brunner, M. (2018). Übereinstimmung, Variabilität und Reliabilität von Schülerurteilen zur Unterrichtsqualität auf Schulebene: Ergebnisse aus 81 Ländern. Zeitschrift Für Erziehungswissenschaft, 21(5), 929–950. https://doi.org/10.1007/s11618-018-0813-3
  • Westine, C. D., Spybrook, J., & Taylor, J. A. (2013). An empirical investigation of variance design parameters for planning cluster-randomized trials of science achievement. Evaluation Review, 37(6), 490–519. https://doi.org/10.1177/0193841X14531584
  • World Economic Forum. (2015). New vision for education. Unlocking the potential of technology. http://www3.weforum.org/docs/WEFUSA_NewVisionforEducation_Report2015.pdf
  • Xu, Z., & Nichols, A. (2010). New estimates of design parameters for clustered randomization studies. Findings from North Carolina and Florida. National Center for Analysis of Longitudinal Data in Education. http://www.urban.org/sites/default/files/alfresco/publication-pdfs/1001394-New-Estimates-of-Design-Parameters-for-Clustered-Randomization-Studies.pdf
  • Zhu, P., Jacob, R., Bloom, H., & Xu, Z. (2012). Designing and analyzing studies that randomize schools to estimate intervention effects on student academic outcomes without classroom-level information. Educational Evaluation and Policy Analysis, 34(1), 45–68. https://doi.org/10.3102/0162373711423786
  • Zopluoglu, C. (2012). A cross-national comparison of intra-class correlation coefficient in educational achievement outcomes. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 3(1), 233–270.