33
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Outcome-guided Bayesian clustering for disease subtype discovery using high-dimensional transcriptomic data

&
Received 24 Nov 2022, Accepted 23 May 2024, Published online: 07 Jun 2024

References

  • E. Bair and R. Tibshirani, Semi-supervised methods to predict patient survival from gene expression data, PLoS. Biol. 2 (2004), pp. e108.
  • S. Basu, A. Banerjee, and R.J. Mooney, Active semi-supervision for pairwise constrained clustering, in Proceedings of the 2004 SIAM International Conference on Data Mining, SIAM, 2004, pp. 333–344.
  • C. Bouveyron and C. Brunet-Saumard, Model-based clustering of high-dimensional data: A review, Comput. Stat. Data. Anal. 71 (2014), pp. 52–78.
  • H. Braak and E. Braak, Neuropathological stageing of Alzheimer-related changes, Acta. Neuropathol. 82 (1991), pp. 239–259.
  • D.E. Bredesen, Metabolic profiling distinguishes three subtypes of Alzheimer's disease., Aging 7 (2015), pp. 595–600.
  • A.S. Coates, E.P. Winer, A. Goldhirsch, R.D. Gelber, M. Gnant, M. Piccart-Gebhart, B. Thürlimann, H.J. Senn, P. Members, F. André, and J. Baselga, Tailoring therapies–improving the management of early breast cancer: ST Gallen international expert consensus on the primary therapy of early breast cancer 2015, Ann. Oncol. 26 (2015), pp. 1533–1546.
  • D.R. Cox and E.J. Snell, Analysis of binary data, Biometrics, 46(2) (1990), pp. 550.
  • Y. Cruz-Almeida, A. Johnson, L. Meng, P. Sinha, A. Rani, S. Yoder, Z. Huo, T.C. Foster, and R.B. Fillingim, Epigenetic age predictors in community-dwelling adults with high impact knee pain, Mol. Pain. 18 (2022), pp. 174480692211180.
  • N. Cunningham, J.E. Griffin, and D.L. Wild, ParticleMDI: Particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification, Adv. Data. Anal. Classif. 14 (2020), pp. 463–484.
  • C. Curtis, S.P. Shah, S.F. Chin, G. Turashvili, O.M. Rueda, M.J. Dunning, D. Speed, A.G. Lynch, S. Samarajiwa, Y. Yuan, and Gräf S, The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups, Nature 486 (2012), pp. 346–352.
  • A. Di Benedetto, C. Ercolani, M. Mottolese, F. Sperati, L. Pizzuti, P. Vici, I. Terrenato, A.M. Shaaban, M.P. Humphries, and L. Di Lauro, Analysis of the ATR-Chk1 and ATM-Chk2 pathways in male breast cancer revealed the prognostic significance of atr expression, Sci. Rep. 7 (2017), pp. 8078.
  • T.T. Drashansky, E.Y. Helm, N. Curkovic, J. Cooper, P. Cheng, X. Chen, N. Gautam, L. Meng, A.J. Kwiatkowski, W.O. Collins, and B.G. Keselowsky, BCL11B is positioned upstream of PLZF and RORγt to control thymic development of mucosal-associated invariant T cells and MAIT17 program, Iscience 24 (2021), pp. 102307.
  • S. Dudoit and J. Fridlyand, A prediction-based resampling method for estimating the number of clusters in a dataset, Genome. Biol. 3 (2002), pp. 1–21.
  • B. Efron and R. Tibshirani, Empirical bayes methods and false discovery rates for microarrays, Genet. Epidemiol. 23 (2002), pp. 70–86.
  • M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. 95 (1998), pp. 14863–14868.
  • S. Gaynor and E. Bair, Identification of relevant subtypes via preweighted sparse clustering, Comput. Stat. Data. Anal. 116 (2017), pp. 139–154.
  • S. Geman and D. Geman, Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images, IEEE. Trans. Pattern. Anal. Mach. Intell. PAMI-6, 1984), pp. 721–741.
  • M. Giuliano, M.V. Trivedi, and R. Schiff, Bidirectional crosstalk between the estrogen receptor and human epidermal growth factor receptor 2 signaling pathways in breast cancer: Molecular basis and clinical implications, Breast Care 8 (2013), pp. 256–262.
  • Y. Guo, X. Shang, and Z. Li, Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer, Neurocomputing 324 (2019), pp. 20–30.
  • C. Gutierrez and R. Schiff, HER2: Biology, detection, and clinical implications, Arch. Pathol. Lab. Med. 135 (2011), pp. 55–62.
  • S. Han, D. Fu, G.W. Tushoski, L. Meng, K.M. Herremans, A.N. Riner, T.J. Geoge, Z. Huo, and S.J. Hughes, Single-cell profiling of microenvironment components by spatial localization in pancreatic ductal adenocarcinoma, Theranostics 12 (2022), pp. 4980–4992.
  • S.H. Hare and A.J. Harvey, mTOR function and therapeutic targeting in breast cancer, Am. J. Cancer. Res. 7 (2017), pp. 383.
  • K.A. Heller and Z. Ghahramani, Bayesian hierarchical clustering, in Proceedings of the 22nd International Conference on Machine Learning. 2005, pp. 297–304.
  • L. Hubert and P. Arabie, Comparing partitions, J. Classif. 2 (1985), pp. 193–218.
  • Z. Huo, Y. Ding, S. Liu, S. Oesterreich, and G. Tseng, Meta-analytic framework for sparse k-means to identify disease subtypes in multiple transcriptomic studies, J. Am. Stat. Assoc. 111 (2016), pp. 27–42.
  • Z. Huo and G. Tseng, Integrative sparse k-means with overlapping group lasso in genomic applications for disease subtype discovery, Ann. Appl. Stat. 11 (2017), pp. 1011–1039.
  • H. Ishwaran and J.S. Rao, Spike and slab variable selection: Frequentist and Bayesian strategies, Ann. Stat. 33 (2005), pp. 730–773.
  • N.M. Iyengar, X.K. Zhou, H. Mendieta, O. El-Hely, D.D. Giri, L. Winston, D.J. Falcone, H. Wang, L. Meng, T. Ha, and M. Pollak, Effects of obesity on breast aromatase expression and systemic metabo-inflammation in women with BRCA1 or BRCA2 mutations, NPJ. Breast. Cancer. 7 (2021), pp. 18.
  • P. Jaccard, Étude comparative de la distribution florale dans une portion des alpes et des jura, Bull. Soc. Vaudoise Sci. Nat. 37 (1901), pp. 547–579.
  • B.D. Lehmann, J.A. Bauer, X. Chen, M.E. Sanders, A.B. Chakravarthy, Y. Shyr, and J.A. Pietenpol, Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies, J. Clin. Invest. 121 (2011), pp. 2750–2767.
  • X. Luo and Y. Wei, Batch effects correction with unknown subtypes, J. Am. Stat. Assoc. 114 (2019), pp. 581–594.
  • J.M. Marin, K. Mengersen, and C.P. Robert, Bayesian modelling and inference on mixtures of distributions, Handbook Stat 25 (2005), pp. 459–507.
  • J.M. Marin and C.P. Robert, Bayesian Core: A Practical Approach to Computational Bayesian Statistics, 268, Springer, New York, 2007.
  • G.J. McLachlan, R. Bean, and D. Peel, A mixture model-based approach to the clustering of microarray expression data, Bioinformatics 18 (2002), pp. 413–422.
  • M. Medvedovic, K.Y. Yeung, and R.E. Bumgarner, Bayesian mixture model based clustering of replicated microarray data, Bioinformatics 20 (2004), pp. 1222–1232.
  • L. Meng, D. Avram, G. Tseng, and Z. Huo, Outcome-guided sparse k-means for disease subtype discovery via integrating phenotypic data with high-dimensional transcriptomic data, J. R. Stat. Soc. Ser. C: Appl. Stat. 71 (2022), pp. 352–375.
  • S. Montesino-Goicolea, L. Meng, A. Rani, Z. Huo, T.C. Foster, R.B. Fillingim, and Y. Cruz-Almeida, Enrichment of genomic pathways based on differential dna methylation profiles associated with knee osteoarthritis pain, Neurobiol Pain 12 (2022), pp. 100107.
  • D.C. Montrose, R. Nishiguchi, S. Basu, H.A. Staab, X.K. Zhou, H. Wang, L. Meng, M. Johncilla, J.R. Cubillos-Ruiz, D.K. Morales, and M.T. Wells, Dietary fructose alters the composition, localization, and metabolism of gut microbiota in association with worsening colitis, Cell. Mol. Gastroenterol. Hepatol. 11 (2021), pp. 525–550.
  • M.A. Newton, A. Noueiry, D. Sarkar, and P. Ahlquist, Detecting differential gene expression with a semiparametric hierarchical mixture method, Biostatistics 5 (2004), pp. 155–176.
  • G. Nowak and R. Tibshirani, Complementary hierarchical clustering, Biostatistics 9 (2008), pp. 467–483.
  • W. Pan and X. Shen, Penalized model-based clustering with application to variable selection, J. Mach. Learn. Res. 8 (2007), pp. 1145–1164.
  • J.S. Parker, M. Mullins, M.C. Cheang, S. Leung, D. Voduc, T. Vickery, S. Davies, C. Fauron, X. He, Z. Hu, and J.F. Quackenbush, Supervised risk predictor of breast cancer based on intrinsic subtypes, J. Clin. Oncol. 27 (2009), pp. 1160–1167.
  • D.W. Parsons, S. Jones, X. Zhang, J.C.H. Lin, R.J. Leary, P. Angenendt, P. Mankoo, H. Carter, I.M. Siu, G.L. Gallia, and A. Olivi, An integrated genomic analysis of human glioblastoma multiforme, Science 321 (2008), pp. 1807–1812.
  • A. Paul and S. Paul, The breast cancer susceptibility genes (BRCA) in breast and ovarian cancers, Front Biosci (Schol Ed) 19 (2014), pp. 605.
  • C.M. Perou, T. Sørlie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, C.A. Rees, J.R. Pollack, D.T. Ross, H. Johnsen, L.A. Akslen, and Ø. Fluge, Molecular portraits of human breast tumours, Nature 406 (2000), pp. 747–752.
  • J.A. Peterson, J.A. Crow, A.J. Johnson, L. Meng, A. Rani, Z. Huo, T.C. Foster, R.B. Fillingim, and Y. Cruz-Almeida, Pain interference mediates the association between epigenetic aging and grip strength in middle to older aged males and females with chronic pain, Front. Aging. Neurosci. 15 (2023), pp. 1122364.
  • A. Prat, E. Pineda, B. Adamo, P. Galván, A. Fernández, L. Gaba, M. Díez, M. Viladot, A. Arance, and M. Muñoz, Clinical implications of the intrinsic molecular subtypes of breast cancer, The Breast 24 (2015), pp. S26–S35.
  • Z.S. Qin, Clustering microarray gene expression data using weighted chinese restaurant process, Bioinformatics 22 (2006), pp. 1988–1997.
  • A. Reif, E. Grünblatt, S. Herterich, I. Wichart, M.K. Rainer, S. Jungwirth, W. Danielczyk, J. Deckert, K.H. Tragl, P. Riederer, and P. Fischer, Association of a functional NOS1 promoter repeat with Alzheimer's disease in the vita cohort, J. Alzheimers. Dis. 23 (2011), pp. 327–333.
  • C.P. Robert and G. Casella, Monte Carlo Statistical Methods, 2, Springer, New York, 1999.
  • A. Rosenwald, G. Wright, W.C. Chan, J.M. Connors, E. Campo, R.I. Fisher, R.D. Gascoyne, H.K. Muller-Hermelink, E.B. Smeland, J.M. Giltnane, and E.M. Hurt, The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma, New England J. Med. 346 (2002), pp. 1937–1947.
  • P.J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20 (1987), pp. 53–65.
  • R. Roy, J. Chun, and S.N. Powell, BRCA1 and BRCA2: Different roles in a common pathway of genome protection, Nat Rev Cancer 12 (2012), pp. 68–78.
  • A. Sadanandam, C.A. Lyssiotis, K. Homicsko, E.A. Collisson, W.J. Gibb, S. Wullschleger, L.C.G. Ostos, W.A. Lannon, C. Grotzinger, M. Del Rio, and B. Lhermitte, A colorectal cancer classification system that associates cellular phenotype and responses to therapy, Nat. Med. 19 (2013), pp. 619–625.
  • R.S. Savage, Z. Ghahramani, J.E. Griffin, P. Kirk, and D.L. Wild, Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data, arXiv preprint arXiv:1304.3577 (2013).
  • G. Schwarz, Estimating the dimension of a model, Ann. Stat. 6 (1978), pp. 461–464.
  • R. Shen, A.B. Olshen, and M. Ladanyi, Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis, Bioinformatics 25 (2009), pp. 2906–2912.
  • D.J. Slamon, G.M. Clark, S.G. Wong, W.J. Levin, A. Ullrich, and W.L. McGuire, Human breast cancer: Correlation of relapse and survival with amplification of the her-2/neu oncogene, Science 235 (1987), pp. 177–182.
  • L. Strath, J.A. Peterson, L. Meng, A. Rani, Z. Huo, T.C. Foster, R. Fillingim, and Y. Cruz-Almeida, Socioeconomic status, knee pain, and epigenetic aging in community-dwelling middle-to-older age adults, J. Pain. 24 (2023), pp. 68.
  • L.J. Strath, L. Meng, A. Rani, Z. Huo, T.C. Foster, R.B. Fillingim, and Y. Cruz-Almeida, Vitamin D metabolism genes are differentially methylated in individuals with chronic knee pain, Lifestyle Genom. 16 (2023), pp. 98–105.
  • R.W. Tothill, A.V. Tinker, J. George, R. Brown, S.B. Fox, S. Lade, D.S. Johnson, M.K. Trivett, D. Etemadmoghadam, B. Locandro, and N. Traficante, Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome, Clin. Cancer. Res. 14 (2008), pp. 5198–5208.
  • L.J. Van't Veer, H. Dai, M.J. Van De Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, and G.J. Schreiber, Gene expression profiling predicts clinical outcome of breast cancer, Nature 415 (2002), pp. 530–536.
  • R.G. Verhaak, K.A. Hoadley, E. Purdom, V. Wang, Y. Qi, M.D. Wilkerson, C.R. Miller, L. Ding, T. Golub, J.P. Mesirov, and G. Alexe, Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1, Cancer. Cell. 17 (2010), pp. 98–110.
  • F.R. Vogenberg, C.I. Barash, and M. Pursel, Personalized medicine: Part 1: Evolution and development into theranostics, Phar. Ther. 35 (2010), pp. 560.
  • G. Von Minckwitz, M. Untch, J.U. Blohmer, S.D. Costa, H. Eidtmann, P.A. Fasching, B. Gerber, W. Eiermann, J. Hilfrich, J. Huober, and C. Jackisch, Definition and impact of pathologic complete response on prognosis after neoadjuvant chemotherapy in various intrinsic breast cancer subtypes, J. Clin. Oncol. 30 (2012), pp. 1796–1804.
  • B. Wang, A.M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, B. Haibe-Kains, and A. Goldenberg, Similarity network fusion for aggregating data types on a genomic scale, Nat. Methods. 11 (2014), pp. 333–337.
  • C.H. Williams-Gray and R.A. Barker, Parkinson disease: Defining PD subtypesâ a step toward personalized management? Nat. Rev. Neurol. 13 (2017), pp. 454–455.
  • D.M. Witten and R. Tibshirani, A framework for feature selection in clustering, J. Am. Stat. Assoc. 105 (2010), pp. 713–726.
  • C.A. Wolff, M.A. Gutierrez-Monreal, L. Meng, X. Zhang, L.G. Douma, H.M. Costello, C.M. Douglas, E. Ebrahimi, A. Pham, A.C. Oliveira, and C. Fu, Defining the age-dependent and tissue-specific circadian transcriptome in male mice, Cell. Rep. 42 (2023), pp. 111982.
  • B. Xie, W. Pan, and X. Shen, Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables, Electron. J. Stat. 2 (2008), pp. 168–212.
  • J. Zhao, Y. Deng, Z. Jiang, and H. Qing, G protein-coupled receptors (GPCRs) in Alzheimer's disease: A focus on BACE1 related GPCRs, Front. Aging. Neurosci. 8 (2016), pp. 58.
  • Z. Zou, T. Tao, H. Li, and X. Zhu, mTOR signaling pathway and mTOR inhibitors in cancer: Progress and challenges, Cell. Biosci. 10 (2020), pp. 1–11.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.