2,416
Views
35
CrossRef citations to date
0
Altmetric
Original Articles

An Evaluation of Weighting Methods Based on Propensity Scores to Reduce Selection Bias in Multilevel Observational Studies

, , , , &

REFERENCES

  • Abadie, A., & Imbens, G.W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica, 74, 235–267.
  • Allison, P.D. (2009). Fixed effects regression models. Los Angeles, CA: Sage.
  • Anand, P., Mizala, A., & Repetto, A. (2009). Using school scholarships to estimate the effect of private education on the academic achievement of low-income students in Chile. Economics of Education Review, 28, 370–381.
  • Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55, 1770–1780.
  • Asparouhov, T. (2006). General multi-level modeling with sampling weights. Communications in Statistics: Theory and Methods, 35, 439–460.
  • Asparouhov, T., & Muthén, B.O. (2012). General random effect latent variable modeling: Random subjects, items, contexts, and parameters. Retrieved from http://www.statmodel.com/papers.shtml
  • Austin, P.C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46, 399–424.
  • Bandalos, D.L., & Leite, W.L. (2013). Use of Monte Carlo studies in structural equation modeling research. In G.R. Hancock & R.O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 625–666). Greenwich, CT: Information Age.
  • Bang, H., & Robins, J.M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 962–973.
  • Bates, D., Maechler, M., & Bolker, B. (2011). Lme4: Linear mixed-effects models using s4 classes. Retrieved from http://cran.cnr.berkeley.edu/web/packages/lme4/index.html
  • Berends, M., Goldring, E., Stein, M., & Cravens, X. (2010). Instructional conditions in charter schools and students’ mathematics achievement gains. American Journal of Education, 116, 303–335.
  • Beretvas, S.N. (2008). Cross-classified random effects models. In AnnA. O’Connell & D. Betsy McCoach (Eds.), Multilevel modeling of educational data (pp. 161–198). Charlotte, NC: Information Age.
  • Brookhart, M.A., Schneeweiss, S., Rothman, K.J., Glynn, R.J., Avorn, J., & Sturmer, T. (2006). Variable selection for propensity score models. American Journal of Epidemiology, 163, 1149–1156.
  • Browne, W.J., Goldstein, H., & Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modelling, 1, 103–124.
  • Brumback, B.A., Hernán, M.A., Haneuse, S.J. P. A., & Robins, J.M. (2004). Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Statistics in Medicine, 23, 749–767.
  • Cepeda, M.S., Boston, R., Farrar, J.T., & Strom, B.L. (2003). Optimal matching with a variable number of controls vs. a fixed number of controls for a cohort study: Trade-offs. Journal of Clinical Epidemiology, 56, 230–237.
  • Cochran, W. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24, 295–313.
  • Cole, S.R., & Hernan, M.A. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168, 656–664.
  • Cook, T.D., & Steiner, P.M. (2009). Some empirically viable alternatives to random assignment. Journal of Policy Analysis & Management, 28, 165–166.
  • Crump, R., Hotz, V.J., Imbens, G.W., & Mitnik, O. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96, 187–199.
  • Cuong, N.V. (2013). Which covariates should be controlled in propensity score matching? Evidence from a simulation study. Statistica Neerlandica, 67, 169–180.
  • Doyle, W.R. (2009). Impact of increased academic intensity on transfer rates: An application of matching estimators to student-unit record data. Research in Higher Education, 50, 52–72.
  • Duncan, T.E., Duncan, S.C., Okut, H., Strycker, L.A., & Li, F. (2002). An extension of the general latent variable growth modeling framework to four levels of the hierarchy. Structural Equation Modeling, 9, 303–326.
  • Freedman, D.A., & Berk, R.A. (2008). Weighting regressions by propensity scores. Evaluation Review, 32, 392–409.
  • Goldstein, H. (1994). Multilevel cross-classified models. Sociological Methods & Research, 22, 364–375.
  • Goldstein, H. (2003). Multilevel statistical models. New York, NY: Halsted.
  • Griswold, M.E., Localio, A.R., & Mulrow, C. (2010). Propensity score adjustment with multilevel data: Setting your sites on decreasing selection bias. Annals of Internal Medicine, 152, 393–396.
  • Gu, X.S., & Rosenbaum, P.R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 2, 405–420.
  • Guo, S., & Fraser, M.W. (2010). Propensity score analysis: Statistical methods and applications. Thousand Oaks, CA: Sage.
  • Guo, S., & Fraser, M.W. (2015). Propensity score analysis: Statistical methods and applications (2nd ed.). Thousand Oaks, CA: Sage.
  • Hahs-Vaughn, D.L., & Onwuegbuzie, A.J. (2006). Estimating and using propensity score analysis with complex samples. The Journal of Experimental Education, 75, 31–65.
  • Hansen, B.B. (2007). Optmatch: Flexible, optimal matching for observational studies. R News, 7, 18–24.
  • Heeringa, S.G., West, B.T., & Berglund, P.A. (2010). Applied survey data analysis. Boca Raton, FL: CRC.
  • Ho, D.E., Imai, K., King, G., & Stuart, E.A. (2006). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15, 199–236.
  • Ho, D.E., Imai, K., King, G., & Stuart, E.A. (2011). Matchit: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 42, 1–28.
  • Ho, D.E., Imai, K., King, G., & Stuart, E.A. (2014). How exactly are the weights created? Retrieved from http://r.iq.harvard.edu/docs/matchit/2.4-20/How_Exactly_are.html
  • Holland, P.W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960.
  • Holmes, W.M. (2014). Using propensity scores in quasi-experimental designs. Thousand Oaks, CA: Sage.
  • Hong, G. (2010). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35, 499–531.
  • Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for evaluating multivalued and multiple treatments with nonexperimental data. Psychological Methods, 17, 44–60.
  • Hong, G., & Raudenbush, S.W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101, 901–910.
  • Hoogland, J.J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367.
  • Imbens, G.W., & Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142, 615–635.
  • Kang, J.D. Y., & Schafer, J.L. (2007a). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.
  • Kang, J.D. Y., & Schafer, J.L. (2007b). Rejoinder: Demystifying double robustness, a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 574–580.
  • Kelcey, B.M. (2009). Improving and assessing propensity score based causal inferences in multilevel and nonlinear settings (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses database. (304929925.)
  • Kelcey, B.M. (2011a). Covariate selection in propensity scores using outcome proxies. Multivariate Behavioral Research, 46, 453–476.
  • Kelcey, B.M. (2011b, April). Propensity score matching within versus across schools. Paper presented at the meeting of the American Educational Research Association, New Orleans, LA.
  • Kim, J., & Seltzer, M. (2007). Causal inference in multilevel settings in which selection processes vary across schools. Los Angeles, CA: Center for Study of Evaluation (CSE).
  • Kish, L. (1965). Survey sampling. New York, NY: Wiley.
  • Leite, W.L., & Zuo, Y. (2011). Modeling latent interactions at level two in multilevel structural equation models: An evaluation of mean-centered and residual-centered unconstrained approaches. Structural Equation Modeling, 18, 449–464.
  • Li, F., Zaslavsky, A.M., & Landrum, M.B. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32, 3373–3387.
  • Lockheed, M., Harris, A., & Jayasundera, T. (2010). School improvement plans and student learning in Jamaica. International Journal of Educational Development, 30, 54–66.
  • Lohr, S. (1999). Sampling: Design and analysis. Pacific Grove, CA: Duxbury.
  • Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B.O. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203–229.
  • Lumley, T. (2010). Complex surveys: A guide to analysis using r. New York: Wiley.
  • Lunceford, J.K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine, 23, 2937–2960.
  • McKelvey, R.D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103–120.
  • Mitra, R., & Reiter, J.P. (2012). A comparison of two methods of estimating propensity scores after multiple imputation. Statistical Methods in Medical Research.
  • Moerbeek, M. (2005). Randomization of clusters versus randomization of persons within clusters: Which is preferable? The American Statistician, 59, 173–179.
  • Morgan, P.L., Frisco, M.L., Farkas, G., & Hibel, J. (2010). A propensity score matching analysis of the effects of special education services. The Journal of Special Education, 43, 236–254.
  • Morgan, S.L., & Harding, D.J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods and Research, 35, 3–60.
  • Morgan, S.L., & Winship, C. (2015). Counterfactuals and causal inference: Methods and principles for social research (2nd ed.). New York, NY: Cambridge University Press.
  • Muthén & Muthén. (2013). Mplus (Version 7.0). Los Angeles, CA: Muthén & Muthén.
  • Muthén, B.O. (1994). Multilevel covariance structure analysis. Sociological Methods and Research, 22, 376–398.
  • National Center for Education Statistics. (2010). Early childhood longitudinal study (ECLS), 2010. Retrieved from http://nces.ed.gov/ecls
  • National Center for Education Statistics. (2011). 2007–08SASS methods and procedures, 2010. Retrieved from http://nces.ed.gov/surveys/sass/methods0708.asp
  • Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8, 434–447.
  • Ou, S.R., & Reynolds, A.J. (2010). Grade retention, postsecondary education, and public aid receipt. Educational Evaluation and Policy Analysis, 32, 118–139.
  • Pfeffermann, D., Skinner, C.J., Holmes, D.J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multi-level models. Journal of the Royal Statistical Society, Series B, 60, 23–56.
  • Pinheiro, J.C., & Bates, D.M. (2000). Mixed-effects models in S and S-plus. New York, NY: Springer-Verlag.
  • Preacher, K.J., Zyphur, M.J., & Zhang, Z. (2010). A general multilevel sem framework for assessing multilevel mediation. Psychological Methods, 15, 209–233.
  • R Development Core Team. (2012). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
  • Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169, 805–827.
  • Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
  • Ridgeway, G., Mccaffrey, D., Morral, A., Burgette, L., & Griffin, B.A. (2013). Toolkit for weighting and analysis of nonequivalent groups: A tutorial for the twang package. Retrieved from http://cran.r-project.org/web/packages/twang/vignettes/twang.pdf
  • Robins, J.M., Hernan, M.A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.
  • Rodgers, J.L. (1999). The bootstrap, the jackknife, and the randomization test: A sampling taxonomy. Multivariate Behavioral Research, 34, 441–456.
  • Rosenbaum, P.R. (1989). Optimal matching for observational studies. Journal of the American Statistical Association, 84, 1024.
  • Rosenbaum, P.R. (2010). Design of observational studies. New York, NY: Springer.
  • Rosenbaum, P.R., & Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
  • Rubin, D.B. (1973). The use of matching and regression adjustment to remove bias in observational studies. Biometrics, 29, 185–203.
  • Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.
  • Rubin, D.B. (1986). Comment: Which ifs have causal answers? Journal of the American Statistical Association, 81, 961–962.
  • Schafer, J.L., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13, 279–313.
  • Schochet, P.Z. (2009). Statistical power for regression discontinuity designs in education evaluations. Journal of Educational and Behavioral Statistics, 34, 238–266.
  • Schochet, P.Z. (2012). Estimators for clustered education RCTS using the Neyman model for causal inference. Journal of Educational and Behavioral Statistics.
  • Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., & Cook, E.F. (2008). Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety, 17, 546–555.
  • Shadish, W.R. (2002). Revisiting field experiments: Field notes for the future. Psychological Methods, 7, 3–18.
  • Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
  • Snijders, T.A. B., & Bosker, R.J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.
  • Snijders, T.A. B., & Bosker, R.J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage.
  • Stapleton, L.M. (2002). The incorporation of sample weights into multilevel structural equation models. Structural Equation Modeling, 9, 475–503.
  • Steiner, P.M., Kim, J.-S., & Thoemmes, F. (2013). Matching strategies for observational multilevel data. Paper presented at the Joint Statistical Meetings, Alexandria, VA.
  • Sterba, S.K. (2009). Alternative model-based and design-based frameworks for inference from samples to populations: From polarization to integration. Multivariate Behavioral Research, 44, 711–740.
  • Strumer, T., Rothman, K.J., Avorn, J., & Glynn, R.J. (2010). Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails of the propensity score distribution, a simulation study. Practice of Epidemiology, 172, 842–854.
  • Stuart, E.A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25, 1–21.
  • Su, Y.-S., & Cortina, J. (2009, September). What do we gain? Combining propensity score methods and multilevel modeling. Paper presented at the Annual Meeting of the American Political Science Association, Toronto, Canada.
  • Thoemmes, F.J., & Kim, E.S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46, 90–118.
  • Thoemmes, F.J., & West, S.G. (2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46, 514–543.
  • Van Landeghem, G., De Fraine, B.,Van Damme, J.2005The consequence of ignoring a level of nesting in multilevel analysis: A comment. Multivariate Behavioral Research, 40, 423–434.
  • Winship, C., & Morgan, S.L. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659–706.
  • Zhu, P., Jacob, R., Bloom, H., & Xu, Z. (2011). Designing and analyzing studies that randomize schools to estimate intervention effects on student academic outcomes without classroom-level information. Educational Evaluation and Policy Analysis, 34, 45–68.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.