Publication Cover
Mycology
An International Journal on Fungal Biology
Latest Articles
276
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An alignment- and reference-free strategy using k-mer present pattern for population genomic analyses

, , , , , , & ORCID Icon show all
Received 17 Dec 2023, Accepted 17 May 2024, Published online: 05 Jun 2024

References

  • 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature. 526(7571):68–74. doi: 10.1038/nature15393.
  • Alexander DH, Lange K. 2011. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinform. 12(1):246. doi: 10.1186/1471-2105-12-246.
  • Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9):1655–1664. doi: 10.1101/gr.094052.109.
  • Aylward AJ, Petrus S, Mamerto A, Hartwick NT, Michael TP, Alkan C. 2023. PanKmer: k-mer-based and reference-free pangenome analysis. Bioinformatics. 39(10):btad621. doi: 10.1093/bioinformatics/btad621.
  • Ballouz S, Dobin A, Gillis JA. 2019. Is it time to change the reference genome? Genome Biol. 20(1):159. doi: 10.1186/s13059-019-1774-4.
  • Bayer PE, Petereit J, Durant É, Monat C, Rouard M, Hu HF, Chapman B, Li CD, Cheng SF, Batley J, et al. 2022. Wheat Panache: A pangenome graph database representing presence-absence variation across sixteen bread wheat genomes. Plant Genome. 15(3):e20221. doi: 10.1002/tpg2.20221.
  • Bernard G, Chan CX, Chan YB, Chua XY, Cong Y, Hogan JM, Maetschke SR, Ragan MA. 2019. Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief Bioinform. 20(2):426–435. doi: 10.1093/bib/bbx067.
  • Bromberg R, Grishin NV, Otwinowski Z. 2016. Phylogeny reconstruction with alignment-free method that corrects for horizontal gene transfer. PLoS Comput Biol. 12(6):e1004985. doi: 10.1371/journal.pcbi.1004985.
  • Casillas S, Barbadilla A. 2017. Molecular population genetics. Genetics. 205(3):1003–1035. doi: 10.1534/genetics.116.196493.
  • Chen S, Krusche P, Dolzhenko E, Sherman RM, Petrovski R, Schlesinger F, Kirsche M, Bentley DR, Schatz MC, Sedlazeck FJ, et al. 2019. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20(1):291. doi: 10.1186/s13059-019-1909-7.
  • Duan SF, Han PJ, Wang QM, Liu WQ, Shi JY, Li K, Zhang XL, Bai FY. 2018. The origin and adaptive evolution of domesticated populations of yeast from Far East Asia. Nat Commun. 9(1):2690. doi: 10.1038/s41467-018-05106-7.
  • Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, Mao Y, Korbel JO, Eichler EE, Zody MC, et al. 2022. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet. 54(4):518–525. doi: 10.1038/s41588-022-01043-w.
  • Fan H, Ives AR, Surget-Groba Y, Cannon CH. 2015. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics. 16(1):522. doi: 10.1186/s12864-015-1647-5.
  • Felsenstein J. 1989. Mathematics vs. evolution: mathematical evolutionary theory Marcus W. Feldman. Ed. Princeton University Press, Princeton, NJ, 1989. x, 341 pp. $60; paper, $19.95. Science. 246(4932):941–942. doi: 10.1126/science.246.4932.941.
  • Gardner SN, Hall BG. 2013. When whole-genome alignments just won’t work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes. PLoS One. 8(12):e81760. doi: 10.1371/journal.pone.0081760.
  • Grytten I, Rand KD, Sandve GK. 2022. KAGE: fast alignment-free graph-based genotyping of SNPs and short indels. Genome Biol. 23(1):209. doi: 10.1186/s13059-022-02771-2.
  • Han DY, Han PJ, Rumbold K, Koricha AD, Duan SF, Song L, Shi JY, Li K, Wang QM, Bai FY. 2021. Adaptive gene content and allele distribution variations in the wild and domesticated populations of Saccharomyces cerevisiae. Front Microbiol. 12:631250. doi: 10.3389/fmicb.2021.631250.
  • Haubold B. 2014. Alignment-free phylogenetics and population genetics. Brief Bioinform. 15:407–418. doi: 10.1093/bib/bbt083.
  • Hatje K, Kollmar M. 2012. A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method. Front Plant Sci. 3:192. doi: 10.3389/fpls.2012.00192.
  • Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. 2012. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet. 44(2):226–232. doi: 10.1038/ng.1028.
  • Jaillard M, Lima L, Tournoud M, Mahé P, van Belkum A, Lacroix V, Jacob L, Didelot X. 2018. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLoS Genet. 14(11):e1007758. doi: 10.1371/journal.pgen.1007758.
  • Johnston HR, Keats BJB, Sherman SL, Population Genetics, 2019. Emery and Rimoin’s principles and practice of medical genetics and genomics. 7th ed. Pyeritz RE, ed. et al. pp. 359–373. London: Academic Press.
  • Jonkheer EM, van Workum DM, Anari SS, Brankovics B, de Haan JR, Berke L, van der Lee TAJ, de Ridder D, Smit S. 2022. PanTools v3: functional annotation, classification and phylogenomics. Bioinformatics. 38(18):4403–4405. doi: 10.1093/bioinformatics/btac506.
  • Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 35(6):1547–1549. doi: 10.1093/molbev/msy096.
  • Lee H, Shuaibi A, Bell JM, Pavlichin DS, Ji HP. 2020. Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations. NAR Cancer. 2(4):zcaa034. doi: 10.1093/narcan/zcaa034.
  • Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C, Croucher NJ, Marttinen P, Davies MR, Steer AC, Tong SY, et al. 2016. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun. 7(1):12797. doi: 10.1038/ncomms12797.
  • Li HB, Wang SH, Chai S, Yang ZQ, Zhang QQ, Xin HJ, Xu YC, Lin SN, Chen XX, Yao ZW, et al. 2022. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat Commun. 13(1):682. doi: 10.1038/s41467-022-28362-0.
  • Parfrey LW, Lahr DJG, Katz LA. 2008. The dynamic nature of eukaryotic genomes. Mol Biol Evol. 25:787–794. doi: 10.1093/molbev/msn032.
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Müller A, Nothman J, Louppe G. 2011. Scikit-learn: machine learning in python. J Machlearn Res. 12:2825–2830.
  • Peter J, De Chiara M, Friedrich A, Yue JX, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, et al. 2018. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature. 556(7701):339–344. doi: 10.1038/s41586-018-0030-5.
  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81(3):559–575. doi: 10.1086/519795.
  • Qi J, Wang B, Hao BI. 2004. Whole proteome prokaryote phylogeny without sequence alignment: a k-string composition approach. J Mol Evol. 58(1):1–11. doi: 10.1007/s00239-003-2493-7.
  • Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, He Q, Ou S, Zhang H, Li X, et al. 2021. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 184(13):3542–3558.e16. doi: 10.1016/j.cell.2021.04.046.
  • Rahman A, Hallgrímsdóttir I, Eisen M, Pachter L. 2018. Association mapping from sequencing reads using k-mers. eLife. 7:e32920. doi: 10.7554/eLife.32920.
  • Rakocevic G, Semenyuk V, Lee W, Spencer J, Browning J, Johnson IJ, Arsenijevic V, Nadj J, Ghose K, Suciu MC, et al. 2019. Fast and accurate genomic analyses using genome graphs. Nat Genet. 51(2):354–362. doi: 10.1038/s41588-018-0316-4.
  • Reinert G, Chew D, Sun F, Waterman MS. 2009. Alignment-free sequence comparison (I): statistics and power. J Computer Biological. 16(12):1615–1634. doi: 10.1089/cmb.2009.0198.
  • Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F. 2018. Alignment-free sequence analysis and applications. Annu Rev Biomed Data Sci. 1:93–114. doi: 10.1146/annurev-biodatasci-080917-013431.
  • Robinson DF, Foulds LR. 1981. Comparison of phylogenetic trees. Math Biosci. 53(1–2):131–147. doi: 10.1016/0025-5564(81)90043-2.
  • Sims GE, Jun SR, Wu GA, Kim SH. 2009. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci U S A. 106:2677–2682. doi: 10.1073/pnas.0813249106.
  • Sirén J, Monlong J, Chang X, Novak AM, Eizenga JM, Markello C, Sibbesen JA, Hickey G, Chang P, Carroll A, et al. 2021. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science. 374(6574):abg8871. doi: 10.1126/science.abg8871.
  • Voichek Y, Weigel D. 2020. Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nat Genet. 52(5):534–540. doi: 10.1038/s41588-020-0612-7.
  • Wang H, Xu Z, Gao L, Hao B. 2009. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evol Biol. 9(1):195. doi: 10.1186/1471-2148-9-195.
  • Wang J, Yang W, Zhang S, Hu H, Yuan Y, Dong J, Chen L, Ma Y, Yang T, Zhou L, et al. 2023. A pangenome analysis pipeline provides insights into functional gene identification in rice. Genome Biol. 24(1):19. doi: 10.1186/s13059-023-02861-9.
  • Yi H, Jin L. 2013. Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucleic Acids Res. 41:e75. doi: 10.1093/nar/gkt003.
  • Zhou Y, Zhang Z, Bao Z, Li H, Lyu Y, Zan Y, Wu Y, Cheng L, Fang Y, Wu K, et al. 2022. Graph pangenome captures missing heritability and empowers tomato breeding. Nature. 606(7914):527–534. doi: 10.1038/s41586-022-04808-9.
  • Zielezinski A, Vinga S, Almeida J, Karlowski WM. 2017. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18(1):186. doi: 10.1186/s13059-017-1319-7.