199
Views
1
CrossRef citations to date
0
Altmetric
Clustering

Graphical and Computational Tools to Guide Parameter Choice for the Cluster Weighted Robust Model

ORCID Icon, ORCID Icon, ORCID Icon &
Pages 1195-1214 | Received 02 Mar 2022, Accepted 27 Nov 2022, Published online: 09 Jan 2023
 

Abstract

The Cluster Weighted Robust Model (CWRM) is a recently introduced methodology to robustly estimate mixtures of regressions with random covariates. The CWRM allows users to flexibly perform regression clustering, safeguarding it against data contamination and spurious solutions. Nonetheless, the resulting solution depends on the chosen number of components in the mixture, the percentage of impartial trimming, the degree of heteroscedasticity of the errors around the regression lines and of the clusters in the explanatory variables. Therefore, an appropriate model selection is crucially required. Such a complex modeling task may generate several “legitimate” solutions: each one derived from a distinct hyperparameters specification. The present article introduces a two step-monitoring procedure to help users effectively explore such a vast model space. The first phase uncovers the most appropriate percentages of trimming, whilst the second phase explores the whole set of solutions, conditioning on the outcome derived from the previous step. The final output singles out a set of “top” solutions, whose optimality, stability and validity is assessed. Novel graphical and computational tools—specifically tailored for the CWRM framework—will help the user make an educated choice among the optimal solutions. Three examples on real datasets showcase our proposal in action. Supplementary files for this article are available online.

Supplementary Materials

README: the supplemental files include a README describing the content of the supplementary materials

Appendix: the supplemental files include a further analysis of the tourism dataset, validation of optimal solutions via the Total Sum of Squares Decomposition and additional details on computing times.

R code: the supplemental files include an R script providing a short tutorial on how to use the CWRMmonitor package (github.com/AndreaCappozzo/CWRMmonitor) implementing the monitoring procedure described in the article.

Rds file: the supplemental files include an .Rds object containing the CWRM models fitted on the AIS data to which apply the monitoring procedure, recovering the results reported in Section 4.3

Acknowledgments

The authors wish to thank two anonymous reviewers and the Associate Editor for their helpful comments.

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

Andrea Cappozzo’s work is supported by the Research Programme: “Integration between study design and data analytics for generating credible evidence in the field of healthcare from heterogeneous sources of structured and unstructured data.” Francesca Greselin work is supported by Milano-Bicocca University Fund for Scientific Research, 2019-ATE-0076. Luis A. García-Escudero and Agustín Mayo Iscar work is supported by the Spanish Ministerio de Ciencia e Innovación, grant PID2021-128314NB-I00.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.