687
Views
2
CrossRef citations to date
0
Altmetric
Theory and Methods

Federated Offline Reinforcement Learning

ORCID Icon, , , , & ORCID Icon
Received 11 Jun 2022, Accepted 19 Jan 2024, Published online: 01 Apr 2024

References

  • Agarwal, A., Kakade, S. M., Lee, J. D., and Mahajan, G. (2021), “On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift,” The Journal of Machine Learning Research, 22, 4431–4506.
  • Agarwal, R., Schuurmans, D., and Norouzi, M. (2020), “An Optimistic Perspective on Offline Reinforcement Learning,” in International Conference on Machine Learning, pp. 104–114, PMLR.
  • Agu, E., Pedersen, P., Strong, D., Tulu, B., He, Q., Wang, L., and Li, Y., (2013), “The Smartphone as a Medical Device: Assessing Enablers, Benefits and Challenges,” in 2013 IEEE International Workshop of Internet-of-Things Networking and Control (IoT-NC), pp. 48–52. IEEE. DOI: 10.1109/IoT-NC.2013.6694053.
  • Antos, A., Szepesvári, C., and Munos, R. (2008), “Fitted q-iteration in Continuous Action-Space MDPS,” in Advances in Neural Information Processing Systems (Vol. 20), eds. J. Platt, D. Koller, Y. Singer, and S. Roweis, Curran Associates, Inc.
  • Battey, H., Fan, J., Liu, H., Lu, J., and Zhu, Z. (2018), “Distributed Testing and Estimation Under Sparse High Dimensional Models,” Annals of Statistics, 46, 1352–1382.
  • Cao, B., Zheng, L., Zhang, C., Yu, P. S., Piscitello, A., Zulueta, J., Ajilore, O., Ryan, K., and Leow, A. D. (2017), “Deepmood: Modeling Mobile Phone Typing Dynamics for Mood Detection,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 747–755.
  • Chakraborty, B. (2013), Statistical Methods for Dynamic Treatment Regimes, New York: Springer.
  • Chakraborty, B., and Murphy, S. A. (2014), “Dynamic Treatment Regimes,” Annual Review of Statistics and its Application, 1, 447–464. DOI: 10.1146/annurev-statistics-022513-115553.
  • Chen, T., Zhang, K., Giannakis, G. B., and Başar, T. (2022), “Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning,” IEEE Transactions on Control of Network Systems, 9, 917–929. DOI: 10.1109/TCNS.2021.3078100.
  • Chen, Y., Zhang, X., Zhang, K., Wang, M., and Zhu, X. (2023), “Byzantine-Robust Online and Offline Distributed Reinforcement Learning,” in International Conference on Artificial Intelligence and Statistics, pp. 3230–3269. PMLR.
  • Duan, R., Ning, Y., and Chen, Y. (2022), “Heterogeneity-Aware and Communication-Efficient Distributed Statistical Inference,” Biometrika, 109, 67–83. DOI: 10.1093/biomet/asab007.
  • Duan, Y., Jia, Z., and Wang, M. (2020), “Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation,” in International Conference on Machine Learning, pp. 2701–2709, PMLR.
  • Duchi, J. C., Jordan, M. I., and Wainwright, M. J. (2014), “Privacy Aware Learning,” Journal of the ACM, 61, 1–57. DOI: 10.1145/2666468.
  • Elgabli, A., Park, J., Bedi, A. S., Bennis, M., and Aggarwal, V. (2020), “Gadmm: Fast and Communication Efficient Framework for Distributed Machine Learning,” Journal of Machine Learning Research, 21, 1–39.
  • Fan, X., Ma, Y., Dai, Z., Jing, W., Tan, C., and Low, B. K. H. (2021), “Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee,” in Advances in Neural Information Processing Systems (Vol. 34), pp. 1007–1021.
  • Free, C., Phillips, G., Watson, L., Galli, L., Felix, L., Edwards, P., Patel, V., and Haines, A. (2013), “The Effectiveness of Mobile-Health Technologies to Improve Health Care Service Delivery Processes: A Systematic Review and Meta-Analysis,” PLoS Medicine, 10, e1001363. DOI: 10.1371/journal.pmed.1001363.
  • Fujimoto, S., Meger, D., and Precup, D. (2019), “Off-Policy Deep Reinforcement Learning Without Exploration,” in International Conference on Machine Learning, pp. 2052–2062, PMLR.
  • Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., and Celi, L. A. (2019), “Guidelines for Reinforcement Learning in Healthcare,” Nature Medicine, 25, 16–18. DOI: 10.1038/s41591-018-0310-5.
  • Gottesman, O., Johansson, F., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., et al. (2018), “Evaluating Reinforcement Learning Algorithms in Observational Health Settings,” arXiv preprint arXiv:1805.12298.
  • Gulcehre, C., Wang, Z., Novikov, A., Paine, T. L., Colmenarejo, S. G., Zolna, K., Agarwal, R., Merel, J., Mankowitz, D., Paduraru, C., et al. (2020), “Rl unplugged: Benchmarks for Offline Reinforcement Learning,” arXiv preprint arXiv:2006.13888.
  • Hao, M., Li, H., Luo, X., Xu, G., Yang, H., and Liu, S. (2019), “Efficient and Privacy-Enhanced Federated Learning for Industrial Artificial Intelligence,” IEEE Transactions on Industrial Informatics, 16, 6532–6542. DOI: 10.1109/TII.2019.2945367.
  • Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C., and Ramage, D. (2018), “Federated Learning for Mobile Keyboard Prediction,” arXiv preprint arXiv:1811.03604.
  • Harold, J., Kushner, G., and Yin, G. (1997), “Stochastic Approximation and Recursive Algorithm and Applications,” Application of Mathematics, 35.
  • Hastie, T., Tibshirani, R., and Friedman, J. H. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Vol. 2) New York: Springer.
  • Hong, C., Rush, E., Liu, M., Zhou, D., Sun, J., Sonabend, A., Castro, V. M., Schubert, P., Panickan, V. A., Cai, T., et al. (2021), “Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Rcord Data,” NPJ Digital Medicine, 4, 1–11. DOI: 10.1038/s41746-021-00519-z.
  • Jadbabaie, A., Li, H., Qian, J., and Tian, Y. (2022), “Byzantine-Robust Federated Linear Bandits,” in 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 5206–5213, IEEE. DOI: 10.1109/CDC51059.2022.9992971.
  • Jiang, N., and Li, L. (2016), “Doubly Robust Off-Policy Value Evaluation for Reinforcement Learning,” in International Conference on Machine Learning, pp. 652–661, PMLR.
  • Jin, Y., Yang, Z., and Wang, Z. (2021), “Is Pessimism Provably Efficient for Offline RL?,” in International Conference on Machine Learning, pp. 5084–5096, PMLR.
  • Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., and Mark, R. (2020), MIMIC-IV (version 0.4). PhysioNet.
  • Johnson, A. E., Pollard, T. J., Shen, L., Li-Wei, H. L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., and Mark, R. G. (2016), “MIMIC-III, a Freely Accessible Critical Care Database,” Scientific Data, 3, 1–9. DOI: 10.1038/sdata.2016.35.
  • Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-Efficient Distributed Statistical Inference,” Journal of the American Statistical Association, 114, 668–681. DOI: 10.1080/01621459.2018.1429274.
  • Kidambi, R., Rajeswaran, A., Netrapalli, P., and Joachims, T. (2020), “Morel: Model-based Offline Reinforcement Learning,” in Advances in Neural Information Processing Systems (Vol. 33), pp. 21810–21823.
  • Konečnỳ, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016), “Federated Learning: Strategies for Improving Communication Efficiency,” arXiv preprint arXiv:1610.05492.
  • Lange, S., Gabel, T., and Riedmiller, M. (2012), “Batch Reinforcement Learning,” in Reinforcement Learning, eds. M. Wiering and M. Otterlo, pp. 45–73, Berln: Springer.
  • Lavori, P. W., and Dawson, R. (2004), “Dynamic Treatment Regimes: Practical Design Considerations,” Clinical Trials, 1, 9–20. DOI: 10.1191/1740774s04cn002oa.
  • Lee, S. M., and An, W. S. (2016), “New Clinical Criteria for Septic Shock: Serum Lactate Level as New Emerging Vital Sign,” Journal of Thoracic Disease, 8, 1388–1390. DOI: 10.21037/jtd.2016.05.55.
  • Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020), “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems,” arXiv preprint arXiv:2005.01643.
  • Li, W., Milletarì, F., Xu, D., Rieke, N., Hancox, J., Zhu, W., Baust, M., Cheng, Y., Ourselin, S., Cardoso, M. J., et al. (2019), “Privacy-Preserving Federated Brain Tumour Segmentation,” in International Workshop on Machine Learning in Medical Imaging, pp. 133–141, Springer.
  • Lim, H.-K., Kim, J.-B., Heo, J.-S., and Han, Y.-H. (2020), “Federated Reinforcement Learning for Training Control Policies on Multiple Iot Devices,” Sensors, 20 1359. DOI: 10.3390/s20051359.
  • Liu, B., Cai, Q., Yang, Z., and Wang, Z. (2019), “Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy,” in Advances in Neural Information Processing Systems (Vol. 32), 10565–10576.
  • Luckett, D. J., Laber, E. B., Kahkoska, A. R., Maahs, D. M., Mayer-Davis, E., and Kosorok, M. R. (2020), “Estimating Dynamic Treatment Regimes in Mobile Health Using v-learning,” Journal of the American Statistical Association, 115, 692–706. DOI: 10.1080/01621459.2018.1537919.
  • McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017), “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Artificial Intelligence and Statistics, pp. 1273–1282, PMLR.
  • Min, X., Yu, B., and Wang, F. (2019), “Predictive Modeling of the Hospital Readmission Risk from Patients’ Claims Data Using Machine Learning: A Case Study on Copd,” Scientific Reports, 9, 1–10. DOI: 10.1038/s41598-019-39071-y.
  • Murphy, S. A. (2003), “Optimal Dynamic Treatment Regimes,” Journal of the Royal Statistical Society, Series B, 65, 331–355. DOI: 10.1111/1467-9868.00389.
  • Murphy, S. A., van der Laan, M. J., Robins, J. M., and C. P. P. R. Group. (2001), “Marginal Mean Models for Dynamic Regimes,” Journal of the American Statistical Association, 96, 1410–1423. DOI: 10.1198/016214501753382327.
  • Nachum, O., Chow, Y., Dai, B., and Li, L. (2019), “Dualdice: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections,” in Advances in Neural Information Processing Systems (Vol. 32).
  • Nadiger, C., Kumar, A., and Abdelhak, S. (2019), “Federated Reinforcement Learning for Fast Personalization,” in 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 123–127. IEEE. DOI: 10.1109/AIKE.2019.00031.
  • Parbhoo, S., Bogojeska, J., Zazzi, M., Roth, V., and Doshi-Velez, F. (2017), “Combining Kernel and Model based Learning for HIV Therapy Selection,” in AMIA Summits on Translational Science Proceedings, 2017, 239.
  • Puterman, M. L. (2014), Markov Decision Processes: Discrete Stochastic Dynamic Programming, Hoboken, NJ: Wiley.
  • Raghu, A., Komorowski, M., Celi, L. A., Szolovits, P., and Ghassemi, M. (2017), “Continuous State-Space Models for Optimal Sepsis Treatment: A Deep Reinforcement Learning Approach,” in Machine Learning for Healthcare Conference, pp. 147–163, PMLR.
  • Rhodes, A., Evans, L. E., Alhazzani, W., Levy, M. M., Antonelli, M., Ferrer, R., Kumar, A., Sevransky, J. E., Sprung, C. L., Nunnally, M. E., et al. (2017), “Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock: 2016,” Intensive Care Medicine, 43, 304–377. DOI: 10.1007/s00134-017-4683-6.
  • Robins, J. M. (2004), “Optimal Structural Nested Models for Optimal Sequential Decisions,” in Proceedings of the Second Seattle Symposium in Biostatistics, pp. 189–326, Springer.
  • Roomi, S., Shah, S. O., Ullah, W., Abedin, S. U., Butler, K., Schiers, K., Kohl, B., Yoo, E., Vibbert, M., and Jallo, J. (2021), “Declining Intensive Care Unit Mortality of Covid-19: A Multi-Center Study,” Journal of Clinical Medicine Research, 13, 184–190. DOI: 10.14740/jocmr4452.
  • Rothchild, D., Panda, A., Ullah, E., Ivkin, N., Stoica, I., Braverman, V., Gonzalez, J., and Arora, R. (2020), “Fetchsgd: Communication-Efficient Federated Learning with Sketching,” in International Conference on Machine Learning, pp. 8253–8265, PMLR.
  • Rudd, K. E., Kissoon, N., Limmathurotsakul, D., Bory, S., Mutahunga, B., Seymour, C. W., Angus, D. C., and West, T. E. (2018), “The Global Burden of Sepsis: Barriers and Potential Solutions,” Critical Care, 22, 1–11. DOI: 10.1186/s13054-018-2157-z.
  • Ryoo, S. M., Lee, J., Lee, Y.-S., Lee, J. H., Lim, K. S., Huh, J. W., Hong, S.-B., Lim, C.-M., Koh, Y., and Kim, W. Y. (2018), “Lactate Level versus Lactate Clearance for Predicting Mortality in Patients with Septic Shock Defined by Sepsis-3,” Critical Care Medicine, 46, e489–e495. DOI: 10.1097/CCM.0000000000003030.
  • Scherrer, B., Ghavamzadeh, M., Gabillon, V., Lesner, B., and Geist, M. (2015), “Approximate Modified Policy Iteration and its Application to the Game of Tetris,” Journal of Machine Learning Research, 16, 1629–1676.
  • Schultz, M. H. (1969), “L ∞-Multivariate Approximation Theory,” SIAM Journal on Numerical Analysis, 6 161–183.
  • Sonabend, A., Lu, J., Celi, L. A., Cai, T., and Szolovits, P. (2020), “Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation,” in Advances in Neural Information Processing Systems (Vol. 33), pp. 18967–18977.
  • Sonabend-W, A., Laha, N., Ananthakrishnan, A. N., Cai, T., and Mukherjee, R. (2023), “Semi-Supervised Off-Policy Reinforcement Learning and Value Estimation for Dynamic Treatment Regimes,” Journal of Machine Learning Research, 24, 1–86.
  • Sutton, R. S., and Barto, A. G. (2018), Reinforcement Learning: An Introduction, Cambridge, MA: MIT press.
  • Thomas, P., and Brunskill, E. (2016), “Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning,” in International Conference on Machine Learning, pp. 2139–2148, PMLR.
  • Wang, X., Yang, Z., Chen, X., and Liu, W. (2019), “Distributed Inference for Linear Support Vector Machine,” Journal of Machine Learning Research, 20, 1–41.
  • Xie, T., Cheng, C.-A., Jiang, N., Mineiro, P., and Agarwal, A. (2021), “Bellman-Consistent Pessimism for Offline Reinforcement Learning,” in Advances in Neural Information Processing Systems (Vol. 34), pp. 6683–6694.
  • Xie, T., and Jiang, N. (2021), “Batch Value-Function Approximation with Only Realizability,” in International Conference on Machine Learning, pp. 11404–11413, PMLR.
  • Xie, Z., and Song, S. (2023), “Fedkl: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing kl Divergence,” IEEE Journal on Selected Areas in Communications, 41, 1227–1242. DOI: 10.1109/JSAC.2023.3242734.
  • Xu, X., Peng, H., Sun, L., Bhuiyan, M. Z. A., Liu, L., and He, L. (2021), “Fedmood: Federated Learning on Mobile Health Data for Mood Detection,” arXiv preprint arXiv:2102.09342.
  • Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019), “Federated Machine Learning: Concept and Applications,” ACM Transactions on Intelligent Systems and Technology (TIST), 10, 1–19. DOI: 10.1145/3339474.
  • Zhan, W., Huang, B., Huang, A., Jiang, N., and Lee, J. (2022), “Offline Reinforcement Learning with Realizability and Single-Policy Concentrability,” in Conference on Learning Theory, pp. 2730–2775, PMLR.
  • Zhang, K., Yang, Z., and Başar, T. (2019), “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv preprint arXiv:1911.10635.
  • Zhang, R., Dai, B., Li, L., and Schuurmans, D. (2020), “Gendice: Generalized Offline Estimation of Stationary Values,” arXiv preprint arXiv:2002.09072.
  • Zhang, Z., Zheng, B., and Liu, N. (2020), “Individualized Fluid Administration for Critically Ill Patients with Sepsis with an Interpretable Dynamic Treatment Regimen Model,” Scientific Reports, 10, 1–9. DOI: 10.1038/s41598-020-74906-z.
  • Zhuo, H. H., Feng, W., Lin, Y., Xu, Q., and Yang, Q. (2019), “Federated Deep Reinforcement Learning,” arXiv preprint arXiv:1901.08277.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.