359
Views
3
CrossRef citations to date
0
Altmetric
Clustering

Probabilistic K-means with Local Alignment for Clustering and Motif Discovery in Functional Data

ORCID Icon &
Pages 1119-1130 | Received 30 Dec 2020, Accepted 02 Dec 2022, Published online: 08 Feb 2023
 

Abstract

We develop a new method to locally cluster curves and discover functional motifs, that is, typical shapes that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical shape). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to Italian Covid-19 death curves and Omics data related to mutagenesis. Supplementary materials for this article are available online.

View correction statement:
Correction

Supplementary Materials

Supplementary material includes proofs, additional methods and results. An R implementation (with examples) is available at https://github.com/marziacremona/ProbKMA-FMD.

Acknowledgments

We thank Matthew Reimherr and Piercesare Secchi for discussions about functional data methodology; Kateryna D. Makova and Di (Bruce) Chen for help with the mutagenesis application; Valeria Vitelli and Davide Floriello for their sparse functional clustering code.

Disclosure Statement

The authors report there are no competing interests to declare.

Correction Statement

This article was originally published with errors, which have now been corrected in the online version. Please see Correction (http://dx.doi.org/10.1080/10618600.2024.2356159)

Additional information

Funding

This work was partially funded by the Eberly College of Science, the Institute for Computational and Data Sciences and the Huck Institutes of the Life Sciences (Penn State University); NSF award DMS-1407639; and Tobacco Settlement and CURE funds of the PA Department of Health. M.A. Cremona acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.