3,060
Views
19
CrossRef citations to date
0
Altmetric
Theory and Methods

Combining Multiple Observational Data Sources to Estimate Causal Effects

ORCID Icon &
Pages 1540-1554 | Received 14 Jul 2018, Accepted 15 Apr 2019, Published online: 11 Jun 2019
 

Abstract

The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators. Supplementary materials for this article are available online.

Acknowledgments

We thank the editor, the associate editor, and four anonymous reviewers for suggestions which improved the article significantly. We are grateful to Professor Yi-Hau Chen for providing the data and offering help and advice in interpreting the data. Drs. Lo-Hua Yuan and Xinran Li offered helpful comments. Dr. Yang is partially supported by the National Science Foundation grant DMS 1811245, National Cancer Institute grant P01 CA142538, and Oak Ridge Associated Universities. Dr. Ding is partially supported by the National Science Foundation grant DMS 1713152.

Supplementary materials

The online supplementary material contains technical details and proofs. The R package “Integrative CI” is available at https://github.com/shuyang1987/IntegrativeCI to perform the proposed estimators.

Additional information

Funding

Directorate for Mathematical and Physical Sciences;Division of Mathematical Sciences;

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.