1,739
Views
3
CrossRef citations to date
0
Altmetric
Theory and Methods

Learning Coefficient Heterogeneity over Networks: A Distributed Spanning-Tree-Based Fused-Lasso Regression

, & ORCID Icon
Pages 485-497 | Received 24 Jul 2021, Accepted 24 Aug 2022, Published online: 12 Dec 2022
 

Abstract

Identifying the latent cluster structure based on model heterogeneity is a fundamental but challenging task arises in many machine learning applications. In this article, we study the clustered coefficient regression problem in the distributed network systems, where the data are locally collected and held by nodes. Our work aims to improve the regression estimation efficiency by aggregating the neighbors’ information while also identifying the cluster membership for nodes. To achieve efficient estimation and clustering, we develop a distributed spanning-tree-based fused-lasso regression (DTFLR) approach. In particular, we propose an adaptive spanning-tree-based fusion penalty for the low-complexity clustered coefficient regression. We show that our proposed estimator satisfies statistical oracle properties. Additionally, to solve the problem parallelly, we design a distributed generalized alternating direction method of multiplier algorithm, which has a simple node-based implementation scheme and enjoys a linear convergence rate. Collectively, our results in this article contribute to the theories of low-complexity clustered coefficient regression and distributed optimization over networks. Thorough numerical experiments and real-world data analysis are conducted to verify our theoretical results, which show that our approach outperforms existing works in terms of estimation accuracy, computation speed, and communication costs. Supplementary materials for this article are available online.

Supplementary Materials

The supplementary materials contain detailed derivations for the proposed algorithm, proofs for the theoretical results, as well as the additional numerical simulations. The implementation code is also provided in the supplementary materials.

Disclosure Statement

The authors report there are no competing interests to declare.

Notes

1 Our algorithms and results in this article can easily be extended to cases with datasets of unbalanced sizes.

2 The O and Θ notation are the Bachmann–Landau notations Knuth (Citation1976). an=O(bn) denotes that there exist positive constants C and n0 with |an|Cbn for all nn0; an=Θ(bn) denotes that there exist positive constants C, C and n0 with CanCbn for all nn0.

Additional information

Funding

This work has been supported in part by NSF grants CAREER CNS-2110259, CNS-2112471, CCF-1934884, CCF-2110252, and a Google Faculty Research Award.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.