628
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

ORCID Icon, , & ORCID Icon
Received 10 Nov 2022, Accepted 07 May 2024, Published online: 24 Jun 2024

References

  • Bang, H., and Robins, J. M. (2005), “Doubly Robust Estimation in Missing Data and Causal Inference Models,” Biometrics, 61, 962–973. DOI: 10.1111/j.1541-0420.2005.00377.x.
  • Bastani, H. (2021), “Predicting with Proxies: Transfer Learning in High Dimension,” Management Science, 67, 2964–2984. DOI: 10.1287/mnsc.2020.3729.
  • Bühlmann, P., van de Geer, S., et al. (2015), “High-Dimensional Inference in Misspecified Linear Models,” Electronic Journal of Statistics, 9, 1449–1473. DOI: 10.1214/15-EJS1041.
  • Cai, T., Li, M., and Liu, M. (2022), “Semi-Supervised Triply Robust Inductive Transfer Learning,” arXiv preprint arXiv:2209.04977.
  • Carroll, R. J., Fan, J., Gijbels, I., and Wand, M. P. (1997), “Generalized Partially Linear Single-Index Models,” Journal of the American Statistical Association, 92, 477–489. DOI: 10.1080/01621459.1997.10474001.
  • Castro, V. M., Gainer, V., Wattanasin, N., Benoit, B., Cagan, A., Ghosh, B., Goryachev, S., Metta, R., Park, H., Wang, D., et al. (2022), “The Mass General Brigham Biobank Portal: An i2b2-based Data Repository Linking Disparate and High-Dimensional Patient Data to Support Multimodal Analytics,” Journal of the American Medical Informatics Association, 29, 643–651. DOI: 10.1093/jamia/ocab264.
  • Chakrabortty, A., and Cai, T. (2018), “Efficient and Adaptive Linear Regression in Semi-Supervised Settings,” The Annals of Statistics, 46, 1541–1572. DOI: 10.1214/17-AOS1594.
  • Chernozhukov, V., Chetverikov, D., and Kato, K. (2013), “Gaussian Approximations and Multiplier Bootstrap for Maxima of Sums of High-Dimensional Random Vectors,” The Annals of Statistics, 41, 2786–2819. DOI: 10.1214/13-AOS1161.
  • Chernozhukov, V., Newey, W. K., and Singh, R. (2022), “Debiased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers,” The Econometrics Journal, 25, 576–601. DOI: 10.1093/ectj/utac002.
  • Dukes, O., and Vansteelandt, S. (2021), “Inference for Treatment Effect Parameters in Potentially Misspecified High-Dimensional Models,” Biometrika, 108, 321–334. DOI: 10.1093/biomet/asaa071.
  • Gaziano, J. M., Concato, J., Brophy, M., Fiore, L., Pyarajan, S., Breeling, J., Whitbourne, S., Deen, J., Shannon, C., Humphries, D., et al. (2016), “Million Veteran Program: A Mega-Biobank to Study Genetic Influences on Health and Disease,” Journal of Clinical Epidemiology, 70, 214–223. DOI: 10.1016/j.jclinepi.2015.09.016.
  • Gerds, T. A., Cai, T., and Schumacher, M. (2008), “The Performance of Risk Prediction Models,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, 50, 457–479. DOI: 10.1002/bimj.200810443.
  • Geva, A., Liu, M., Panickan, V. A., Avillach, P., Cai, T., and Mandl, K. D. (2021), “A High-Throughput Phenotyping Algorithm is Portable From Adult to Pediatric Populations,” Journal of the American Medical Informatics Association, 28, 1265–1269. DOI: 10.1093/jamia/ocaa343.
  • Ghosh, S., and Tan, Z. (2022), “Doubly Robust Semiparametric Inference Using Regularized Calibrated Estimation with High-Dimensional Data,” Bernoulli, 28, 1675–1703. DOI: 10.3150/21-BEJ1378.
  • Gronsbell, J., Liu, M., Tian, L., and Cai, T. (2022), “Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling,” Journal of the Royal Statistical Society, Series B, 84, 1353–1391. DOI: 10.1111/rssb.12502.
  • Gronsbell, J. L., and Cai, T. (2018), “Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance,” Journal of the Royal Statistical Society, Series B, 80, 579–594. DOI: 10.1111/rssb.12264.
  • Gu, T., Lee, P. H., and Duan, R. (2023), “Commute: Communication-Efficient Transfer Learning for Multi-Site Risk Prediction,” Journal of Biomedical Informatics, 137, 104243. DOI: 10.1016/j.jbi.2022.104243.
  • He, Y., Lakhani, C. M., Rasooly, D., Manrai, A. K., Tzoulaki, I., and Patel, C. J. (2021), “Comparisons of Polyexposure, Polygenic, and Clinical Risk Scores in Risk Prediction of Type 2 Diabetes,” Diabetes Care, 44, 935–943. DOI: 10.2337/dc20-2049.
  • He, Y., Li, Q., Hu, Q., and Liu, L. (2022), “Transfer Learning in High-Dimensional Semiparametric Graphical Models with Application to Brain Connectivity Analysis,” Statistics in Medicine, 41, 4112–4129. DOI: 10.1002/sim.9499.
  • Hong, C., Rush, E., Liu, M., Zhou, D., Sun, J., Sonabend, A., Castro, A. J., Schubert, P., Panickan, V. A., Cai, T., et al. (2021), “Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Record Data,” npj Digital Medicine, 4, 1–11. DOI: 10.1038/s41746-021-00519-z.
  • Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., and Smola, A. J. (2007), “Correcting Sample Selection Bias by Unlabeled Data,” in Advances in Neural Information Processing Systems, pp. 601–608.
  • Imbens, G. W., and Rubin, D. B. (2015), Causal Inference in Statistics, Social, and Biomedical Sciences, Cambridge: Cambridge University Press.
  • Jin, Z., Ying, Z., and Wei, L. (2001), “A Simple Resampling Method by Perturbing the Minimand,” Biometrika, 88, 381–390. DOI: 10.1093/biomet/88.2.381.
  • Kawakita, M., and Kanamori, T. (2013), “Semi-Supervised Learning with Density-Ratio Estimation,” Machine Learning, 91, 189–209. DOI: 10.1007/s10994-013-5329-8.
  • Kurreeman, F., Liao, K., Chibnik, L., Hickey, B., Stahl, E., Gainer, V., Li, G., Bry, L., Mahan, S., Ardlie, K., et al. (2011), “Genetic Basis of Autoantibody Positive and Negative Rheumatoid Rthritis Risk in a Multi-Ethnic Cohort Derived from Electronic Health Records,” The American Journal of Human Genetics, 88, 57–69. DOI: 10.1016/j.ajhg.2010.12.007.
  • Li, R., and Liang, H. (2008), “Variable Selection in Semiparametric Regression Modeling,” Annals of Statistics, 36, 261–286. DOI: 10.1214/009053607000000604.
  • Li, S., Cai, T. T., and Li, H. (2022), “Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality,” Journal of the Royal Statistical Society, Series B, 84, 149–173. DOI: 10.1111/rssb.12479.
  • Liao, K. P., Sun, J., Cai, T. A., Link, N., Hong, C., Huang, J., Huffman, J. E., Gronsbell, J., Zhang, Y., Ho, Y.-L., et al. (2019), “High-throughput Multimodal Automated Phenotyping (Map) with Application to Phewas,” Journal of the American Medical Informatics Association, 26, 1255–1262. DOI: 10.1093/jamia/ocz066.
  • Liu, D., and Zhou, X.-H. (2010), “A Model for Adjusting for Nonignorable Verification Bias in Estimation of the ROC Curve and its Area with Likelihood-based Approach,” Biometrics, 66, 1119–1128. DOI: 10.1111/j.1541-0420.2010.01397.x.
  • ———(2011), “Semiparametric Estimation of the Covariate-Specific ROC Curve in Presence of Ignorable Verification Bias,” Biometrics, 67, 906–916. DOI: 10.1111/j.1541-0420.2011.01562.x.
  • Liu, M., Zhang, Y., Liao, K. P., and Cai, T. (2023), “Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models,” Journal of Machine Learning Research, 24, 1–50.
  • Liu, M., Zhang, Y., and Zhou, D. (2021), “Double/Debiased Machine Learning for Logistic Partially Linear Model,” The Econometrics Journal, 24, 559–588. DOI: 10.1093/ectj/utab019.
  • Long, Q., Zhang, X., and Johnson, B. A. (2011), “Robust Estimation of Area under ROC Curve Using Auxiliary Variables in the Presence of Missing Biomarker Values,” Biometrics, 67, 559–567. DOI: 10.1111/j.1541-0420.2010.01487.x.
  • Mahajan, A., Taliun, D., Thurner, M., Robertson, N. R., Torres, J. M., Rayner, N. W., Payne, A. J., Steinthorsdottir, V., Scott, R. A., Grarup, N., et al. (2018), “Fine-Mapping Type 2 Diabetes Loci to Single-Variant Resolution Using High-Density Imputation and Islet-Specific Epigenome Maps,” Nature Genetics, 50, 1505–1513. DOI: 10.1038/s41588-018-0241-6.
  • Namkoong, H., Yadlowsky, S., and Cai, T. T. (2023), “Diagnosing Model Performance under Distribution Shift,” arXiv preprint arXiv:2303.02011.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J., Yu, B., et al. (2012), “A Unified Framework for High-Dimensional Analysis of M-estimators with Decomposable Regularizers,” Statistical Science, 27, 538–557. DOI: 10.1214/12-STS400.
  • Ning, Y., Sida, P., and Imai, K. (2020), “Robust Estimation of Causal Effects via a High-Dimensional Covariate Balancing Propensity Score,” Biometrika, 107, 533–554. DOI: 10.1093/biomet/asaa020.
  • Qiu, H., Tchetgen, E. T., and Dobriban, E. (2023), “Efficient and Multiply Robust Risk Estimation Under General Forms of Dataset Shift,” arXiv preprint arXiv:2306.16406.
  • Raskutti, G., Wainwright, M. J., and Yu, B. (2011), “Minimax Rates of Estimation for High-Dimensional Linear Regression over lq-balls,” IEEE Transactions on Information Theory, 57, 6976–6994.
  • Rasmy, L., Wu, Y., Wang, N., Geng, X., Zheng, W. J., Wang, F., Wu, H., Xu, H., and Zhi, D. (2018), “A Study of Generalizability of Recurrent Neural Network-based Predictive Models for Heart Failure Onset Risk Using a Large and Heterogeneous EHR Data Set,” Journal of Biomedical Informatics, 84, 11–16. DOI: 10.1016/j.jbi.2018.06.011.
  • Reddi, S. J., Poczos, B., and Smola, A. (2015), “Doubly Robust Covariate Shift Correction,” in Twenty-Ninth AAAI Conference on Artificial Intelligence. DOI: 10.1609/aaai.v29i1.9576.
  • Rotnitzky, A., Faraggi, D., and Schisterman, E. (2006), “Doubly Robust Estimation of the Area Under the Receiver-Operating Characteristic Curve in the Presence of Verification Bias,” Journal of the American Statistical Association, 101, 1276–1288. DOI: 10.1198/016214505000001339.
  • Shu, H., and Tan, Z. (2018), “Improved Estimation of Average Treatment Effects on the Treated: Local Efficiency, Double Robustness, and Beyond,” arXiv preprint arXiv:1808.01408.
  • Smucler, E., Rotnitzky, A., and Robins, J. M. (2019), “A Unifying Approach for Doubly-Robust l1-regularized Estimation of Causal Contrasts,” arXiv preprint arXiv:1904.03737.
  • Tan, Z. (2020), “Model-Assisted Inference for Treatment Effects Using Regularized Calibrated Estimation with High-Dimensional Data,” Annals of Statistics, 48, 811–837.
  • Tian, Y., and Feng, Y. (2022), “Transfer Learning under High-Dimensional Generalized Linear Models,” Journal of the American Statistical Association, 118, 2684–2697. DOI: 10.1080/01621459.2022.2071278.
  • Weng, C., Shah, N. H., and Hripcsak, G. (2020), “Deep Phenotyping: Embracing Complexity and Temporality—Towards Scalability, Portability, and Interoperability,” Journal of Biomedical Informatics, 105, 103433. DOI: 10.1016/j.jbi.2020.103433.
  • Wiens, J., Guttag, J., and Horvitz, E. (2014), “A Study in Transfer Learning: Leveraging Data from Multiple Hospitals to Enhance Hospital-Specific Predictions,” Journal of the American Medical Informatics Association, 21, 699–706. DOI: 10.1136/amiajnl-2013-002162.
  • Yang, S., Gao, C., Zeng, D., and Wang, X. (2023), “Elastic Integrative Analysis of Randomised Trial and Real-World Data for Treatment Heterogeneity Estimation,” Journal of the Royal Statistical Society, Series B, 85, 575–596. DOI: 10.1093/jrsssb/qkad017.
  • Zhu, Y., Chen, Y., Lu, Z., Pan, S. J., Xue, G.-R., Yu, Y., and Yang, Q. (2011), “Heterogeneous Transfer Learning for Image Classification,” in Twenty-fifth AAAI Conference on Artificial Intelligence. DOI: 10.1609/aaai.v25i1.8090.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.