1,203
Views
0
CrossRef citations to date
0
Altmetric
Articles

Network Gradient Descent Algorithm for Decentralized Federated Learning

, & ORCID Icon
Pages 806-818 | Received 24 Jun 2021, Accepted 02 May 2022, Published online: 06 Jun 2022

References

  • Barut, E., Fan, J., and Verhasselt, A. (2016), “Conditional Sure Independence Screening,” Journal of the American Statistical Association, 111, 1266–1277. DOI: 10.1080/01621459.2015.1092974.
  • Bellet, A., Guerraoui, R., Taziki, M., and Tommasi, M. (2018), “Personalized and Private Peer-to-Peer Machine Learning,” in International Conference on Artificial Intelligence and Statistics, PMLR, pp. 473–481.
  • Blot, M., Picard, D., Cord, M., and Thome, N. (2016), “Gossip Training for Deep Learning,” arXiv preprint arXiv:1611.09726.
  • Boyd, S., Boyd, S. P., and Vandenberghe, L. (2004), Convex Optimization, Cambridge: Cambridge University Press.
  • Cheu, A., Smith, A., Ullman, J., Zeber, D., and Zhilyaev, M. (2019), “Distributed Differential Privacy via Shuffling,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, pp. 375–403.
  • Colin, I., Bellet, A., Salmon, J., and Clémençon, S. (2016), “Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions,” in International Conference on Machine Learning, PMLR, pp. 1388–1396.
  • Cotter, A., Shamir, O., Srebro, N., and Sridharan, K. (2011), “Better Mini-Batch Algorithms via Accelerated Gradient Methods,” NIPS, 24, 1647–1655.
  • Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
  • Glorot, X., and Bengio, Y. (2010), “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, pp. 249–256.
  • Hashimoto, T., Srivastava, M., Namkoong, H., and Liang, P. (2018), “Fairness Without Demographics in Repeated Loss Minimization,” in International Conference on Machine Learning, PMLR, pp. 1929–1938.
  • Ho, N., Khamaru, K., Dwivedi, R., Wainwright, M. J., Jordan, M. I., and Yu, B. (2020), “Instability, Computational Efficiency and Statistical Accuracy,” arXiv preprint arXiv:2005.11411.
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017), “Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861.
  • Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-Efficient Distributed Statistical Inference,” Journal of the American Statistical Association, 114, 668–681. DOI: 10.1080/01621459.2018.1429274.
  • Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., D’Oliveira, R. G. L., Eichner, H., Rouayheb, S. E., Evans, D., Gardner, J., Garrett, Z., Gascón, A., Ghazi, B., Gibbons, P. B., Gruteser, M., Harchaoui, Z., He, C., He, L., Huo, Z., Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi, G., Khodak, M., Konecný, J., Korolova, A., Koushanfar, F., Koyejo, S., Lepoint, T., Liu, Y., Mittal, P., Mohri, M., Nock, R., Özgür, A., Pagh, P., Qi, H., Ramage, D., Raskar, R., Raykova, M., Song, D., Song, W., Stich, S. U., Sun, Z., Suresh, A. T., Tramér, F., Vepakomma, P., Wang, J., Xiong, L., Xu, Z., Yang, Q., Yu, F. X., Yu, H., and Zhao, S. (2021), “Advances and Open Problems in Federated Learning,” Foundations and Trends[textregistered] in Machine Learning, 14, 1–210. DOI: 10.1561/2200000083.
  • Karimi, H., Nutini, J., and Schmidt, M. (2016), “Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-łojasiewicz Condition,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 795–811.
  • Konečnỳ, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. (2016), “Federated Learning: Strategies for Improving Communication Efficiency,” arXiv preprint arXiv:1610.05492.
  • Krizhevsky, A. and Hinton, G. (2009), “Learning Multiple Layers of Features from Tiny Images,” Handbook of Systemic Autoimmune Diseases, 1(4).
  • Lalitha, A., Kilinc, O. C., Javidi, T., and Koushanfar, F. (2019), “Peer-to-peer Federated Learning on Graphs,” arXiv preprint arXiv:1901.11173.
  • Lalitha, A., Shekhar, S., Javidi, T., and Koushanfar, F. (2018), “Fully Decentralized Federated Learning,” in Third Workshop on Bayesian Deep Learning (NeurIPS).
  • Lang, H., Xiao, L., and Zhang, P. (2019), “Using Statistics to Automate Stochastic Optimization,” Advances in Neural Information Processing Systems, 32, 9540–9550.
  • LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998), “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, 86, 2278–2324. DOI: 10.1109/5.726791.
  • Li, M., Zhang, T., Chen, Y., and Smola, A. J. (2014), “Efficient Mini-Batch Training for Stochastic Optimization,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 661–670.
  • Li, Y., Chen, C., Liu, N., Huang, H., Zheng, Z., and Yan, Q. (2021), “A Blockchain-Based Decentralized Federated Learning Framework with Committee Consensus,” IEEE Network, 35, 234–241. DOI: 10.1109/MNET.011.2000263.
  • Lian, X., Zhang, C., Zhang, H., Hsieh, C.-J., Zhang, W., and Liu, J. (2017), “Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent,” arXiv preprint arXiv:1705.09056.
  • Lian, X., Zhang, W., Zhang, C., and Liu, J. (2018), “Asynchronous Decentralized Parallel Stochastic Gradient Descent,” in International Conference on Machine Learning, PMLR, pp. 3043–3052.
  • McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017), “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Artificial Intelligence and Statistics, PMLR, pp. 1273–1282.
  • Nedic, A., Olshevsky, A., and Shi, W. (2017), “Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs,” SIAM Journal on Optimization, 27, 2597–2633. DOI: 10.1137/16M1084316.
  • Nesterov, Y. (1998), “Introductory Lectures on Convex Programming Volume I: Basic Course,” Lecture notes, 3, 5.
  • Rao, C. R., Rao, C. R., Statistiker, M., Rao, C. R., and Rao, C. R. (1973), Linear Statistical Inference and its Applications (Vol. 2), New York: Wiley.
  • Ren, T., Cui, F., Atsidakou, A., Sanghavi, S., and Ho, N. (2022), “Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent,” arXiv preprint arXiv:2110.07810.
  • Richards, D., and Rebeschini, P. (2019), “Optimal Statistical Rates for Decentralised Non-parametric Regression with Linear Speed-Up,” arXiv preprint arXiv:1905.03135.
  • Richards, D., Rebeschini, P., and Rosasco, L. (2020), “Decentralised Learning with Random Features and Distributed Gradient Descent,” in International Conference on Machine Learning, PMLR, pp. 8105–8115.
  • Savazzi, S., Nicoli, M., and Rampa, V. (2020), “Federated Learning with Cooperating Devices: A Consensus Approach for Massive IoT Networks,” IEEE Internet of Things Journal, 7, 4641–4654. DOI: 10.1109/JIOT.2020.2964162.
  • Shao, J. (2003), Mathematical Statistics, Springer Texts in Statistics, New York: Springer.
  • Smith, V., Chiang, C.-K., Sanjabi, M., and Talwalkar, A. (2017), “Federated Multi-Task Learning,” arXiv preprint arXiv:1705.10467.
  • Tang, H., Lian, X., Yan, M., Zhang, C., and Liu, J. (2018), “Decentralized Training Over Decentralized Data,” in International Conference on Machine Learning, PMLR, pp. 4848–4856.
  • Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
  • Van der Vaart, A. W. (2000), Asymptotic Statistics (Vol. 3), Cambridge: Cambridge University Press.
  • Vanhaesebrouck, P., Bellet, A., and Tommasi, M. (2017), “Decentralized Collaborative Learning of Personalized Models Over Networks,” in Artificial Intelligence and Statistics, PMLR, pp. 509–517.
  • Wu, Y., Ren, M., Liao, R., and Grosse, R. (2018), “Understanding Short-Horizon Bias in Stochastic Meta-Optimization,” arXiv preprint arXiv:1803.02021.
  • Yuan, K., Ling, Q., and Yin, W. (2016), “On the Convergence of Decentralized Gradient Descent,” SIAM Journal on Optimization, 26, 1835–1854. DOI: 10.1137/130943170.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.