290
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Clustering and classification problems in genetics through U-statistics

, &
Pages 1882-1902 | Received 10 Apr 2017, Accepted 29 Aug 2017, Published online: 21 Sep 2017

References

  • Pinheiro HP, Pinheiro A, Sen PK. Comparison of genomic sequences using the hamming distance. J Statist Plann Inference. 2005;130(1):325–339. doi: 10.1016/j.jspi.2003.03.002
  • Sen PK. Robust statistical inference for high-dimensional data models with application to genomics. Austrian J Statist. 2006;35(2/3):197–214.
  • Pinheiro A, Sen PK, Pinheiro HP. Decomposability of high-dimensional diversity measures: quasi-statistics, martingales and nonstandard asymptotics. J Multivariate Anal. 2009;100(8):1645–1656. doi: 10.1016/j.jmva.2009.01.007
  • Valk M, Pinheiro A. Time-series clustering via quasi U-statistics. J Time Ser Anal. 2012;33(4):608–619. doi: 10.1111/j.1467-9892.2012.00793.x
  • Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics. New York: Wiley Online Library; 2013.
  • Suzuki R, Shimodaira H. Pvclust: an r package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006;22(12):1540–1542. doi: 10.1093/bioinformatics/btl117
  • Maitra R, Melnykov V, Lahiri SN. Bootstrapping for significance of compact clusters in multidimensional datasets. J Amer Statist Assoc. 2012;107(497):378–392. doi: 10.1080/01621459.2011.646935
  • Liu Y, Hayes DN, Nobel A, et al. Statistical significance of clustering for high-dimension, low-sample size data. J Amer Statist Assoc. 2008;103(483):1281–1293. doi: 10.1198/016214508000000454
  • Huang H, Liu Y, Yuan M, et al. Statistical significance of clustering using soft thresholding. J Comput Graph Statist. 2015;24(4):975–993. doi: 10.1080/10618600.2014.948179
  • Kalina J. Classification methods for high-dimensional genetic data. Biocybern Biomed Eng. 2014;34(1):10–18. doi: 10.1016/j.bbe.2013.09.007
  • Halmos PR. The theory of unbiased estimation. Ann Math Statist. 1946;17(1):34–43. doi: 10.1214/aoms/1177731020
  • Hoeffding W. A class of statistics with asymptotically normal distribution. Ann Math Statist. 1948;19(3): 293–325. doi: 10.1214/aoms/1177730196
  • Denker M. Asymptotic distribution theory in nonparametric statistics. Braunschweig: Springer; 1985.
  • Lee J. U-statistics: theory and practice. New York: Marcel Dekker; 1990.
  • Efron B, Tibshirani RJ. An introduction to the bootstrap. Monograps on Statistics and Applied Probability; 57. New York: Chapman and Hall; 1993.
  • Kingman JFC. The coalescent. Stochastic Process Appl. 1982;13(3):235–248. doi: 10.1016/0304-4149(82)90011-4
  • Hasegawa M, Kishino H, Yano T-a. Dating of the human-ape splitting by a molecular clock of mitochondrial dna. J Mol Evol. 1985;22(2):160–174. doi: 10.1007/BF02101694
  • Cybis GB, Lopes SR, Pinheiro HP. Power of the likelihood ratio test for models of dna base substitution. J Appl Stat. 2011;38(12):2723–2737. doi: 10.1080/02664763.2011.567253
  • Rosenberg NA, Pritchard JK, Weber JL, et al. Genetic structure of human populations. Science. 2002;298(5602):2381–2385. doi: 10.1126/science.1078311
  • Chen GK, Chi EC, Ranola JMO, et al. Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol. 2015;11(5):e1004228. doi: 10.1371/journal.pcbi.1004228
  • Nei M. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci. 1973;70(12):3321–3323. doi: 10.1073/pnas.70.12.3321
  • Clark LV, Jasieniuk M. Polysat: an r package for polyploid microsatellite analysis. Mol Ecol Resour. 2011;11(3):562–566. doi: 10.1111/j.1755-0998.2011.02985.x
  • Kapp AV, Jeffrey SS, Langerød A, et al. Discovery and validation of breast cancer subtypes. BMC Genomics. 2006;7(1):231– doi: 10.1186/1471-2164-7-231
  • Ramaswamy S, Tamayo P, Rifkin R, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci. 2001;98(26):15149–15154. doi: 10.1073/pnas.211566398
  • Sørlie T, Tibshirani R, Parker J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci. 2003;100(14):8418–8423. doi: 10.1073/pnas.0932692100
  • PAHO/WHO. Number of reported cases of dengue and severe dengue (sd) in the Americas, by country: figures for 2015 (to week noted by each country); 2016 [cited 2016 June 3]. Available from: http://www.paho.org/
  • Cybis GB, Sinsheimer JS, Lemey P, et al. Graph hierarchies for phylogeography. Philos Trans R Soc Lond B Biol Sci. 2013;368(1614):20120206. doi: 10.1098/rstb.2012.0206
  • Pybus OG, Suchard MA, Lemey P, et al. Unifying the spatial epidemiology and molecular evolution of emerging epidemics. Proc Natl Acad Sci. 2012;109(37):15066–15071. doi: 10.1073/pnas.1206598109
  • Allicock OM, Lemey P, Tatem AJ, et al. Phylogeography and population dynamics of dengue viruses in the americas. Mol Biol Evol. 2012;29(6):1533–1543. doi: 10.1093/molbev/msr320
  • Guindon S, Dufayard J-F, Lefort V, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0. Syst Biol. 2010;59(3):307–321. doi: 10.1093/sysbio/syq010
  • Gavrilets S, Losos JB. Adaptive radiation: contrasting theory with data. Science. 2009;323(5915):732–737. doi: 10.1126/science.1157966

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.