1,108
Views
2
CrossRef citations to date
0
Altmetric
Articles

RAD sequencing for the development of microsatellite markers for identification of Malaysian taro cultivars

, , , &
Pages 1284-1290 | Received 11 May 2021, Accepted 12 Aug 2021, Published online: 25 Aug 2021

Abstract

The progress in next-generation sequencing (NGS) has transformed the discovery and development of microsatellite markers, particularly for understudied species. In this study, one of the NGS platforms adopted was restriction site-associated DNA sequencing (RAD-seq) to identify and develop microsatellite markers for taro. A total of 8,490,604 clean reads were generated and assembled into 222,445 contigs with only 3,280 contigs found to contain a total of 3454 repeats. From this total, only 1,790 repeats were successfully designed as a flanking primer, and 14 microsatellites were selected at random for cultivar identification of Malaysian taro. The analysis of 14 novel simple sequence repeat (SSR) markers only showed that twelve SSR markers were polymorphic with the remaining two SSR markers, CesSSR9 and CesSSR11, monomorphic. The analysis of polymorphic SSR markers revealed the presence of 50 alleles ranging between two (CesSSR5) and seven (CesSSR13) having an average of 4.17 alleles per locus. The values of the polymorphic information content ranged from 0.05 (CesSSR5) to 0.59 (CesSSR7) with an average of 0.33 per locus. The unweighted pair group method with arithmetic mean (UPGMA) dendrogram showed that cultivar MINYAK was unable to be differentiated from cultivar PUTIH. As such, these newly developed microsatellite markers will provide a tool for molecular identification in taro and provide necessary information for future biodiversity conservation and gene bank management of taro.

Introduction

Taro is considered a root crop that is vegetatively propagated and grown in most tropical regions globally. It belongs to the monocotyledonous family of Araceae. Its flowers can blossom and set seed and it is considered an important socio-economic crop in the Pacific and Southeast Asia regions. Many studies have attempted to discover the origins of this crop, with New Guinea and northeast India being revealed as possible locations for the initial domestication of this crop [Citation1]. Moreover, the enormous variety of the wild Colocasia species of the crop appears to extend from the north-eastern part of India to the Himalayas in southern China [Citation2]. Even though an abundance of genetic resources exists, the numerous attempts to exploit and conserve the germplasm needed to sustain the production of this species remain unresolved. Therefore, precise and accurate molecular characterisation utilising molecular markers is needed for appropriate conservation management and future breeding programme for taro. Microsatellites, also known as simple sequence repeats (SSR), are one of the most broadly used DNA markers that has been employed over the last few decades in cultivar identification [Citation3]. These markers are reproducible, highly polymorphic, codominant, and abundant throughout the eukaryotic genome [Citation4,Citation5], making them favourable DNA markers for a variety of genetic research studies.

Despite the more recent and remarkable marker system, namely single nucleotide polymorphism (SNP), microsatellites are still preferred in conducting various research including cultivar identification and authentication, population structure analyses, genetic drift and gene flow and pedigree or breeding estimation [Citation6]. Likewise, the advancement in next-generation sequencing (NGS) has evolved into a mainstream approach in dealing with high-throughput in identifying and discovering a thousand microsatellites at reduced costs, even in non-model species [Citation7,Citation8].

Using the conventional approach, cloning and unrelenting Sanger sequencing were needed in order to discover even a small number of microsatellites. As such, the NGS approach is preferred. Also, one of the valuable NGS based approaches is RAD-seq, which targets the genomic sequences around the restriction sites before being assembled to generate contigs for microsatellite screening. The RAD-seq approach can produce efficient and effective results for multiple applications, namely, population differentiation, marker improvement, and phylogeography [Citation9–11]. RAD-seq is also more advantageous compared to the more conventional hybridisation approach since the technique allows to sequence the same region across individual or diverse species.

Materials and methods

Plant material and DNA extraction for RAD sequencing

A sample of C. esculenta (cultivar WANGI) was obtained from the MARDI Taro Collection at MyGenebank MARDI. Following the manufacturer’s protocol, total genomic DNA was extracted and collected from fresh young leaves using the Qiagen Plant DNA extraction kit. The DNA integrity and concentration were next measured using 0.8% agarose gel and Fluoraskan Asken, respectively. The genomic DNA was then sent to MyTACG Bioscience Enterprise (Sri Petaling, Malaysia) for RAD sequencing.

RAD-seq sequencing and contigs assembly

Following the standard manufacturer’s protocol, MyTACG Bioscience Enterprise created a RAD library where total genomic DNA was digested with MseI before sequencing using an Illumina HiSeq 2000 sequencer. This was followed by the processing of raw data generated using Stacks [Citation12], which was later used for de novo assembly using Velvet [Citation13].

SSR identification and primer design

MISA, a MIcro-SAtellite identification tool, was used to screen the assembled contigs for microsatellite repeat motifs [Citation14] in which mononucleotide repeats were omitted from the search criteria. The search criteria consisted of the following series of parameters 6, 5, 4, 3, with 3 repeats for the di-, tri-, tetra-, penta- and hexa-repeat types, respectively. The primer pairs were designed using the Primer3Plus software programme with the criterion of the amplicon size between 100 and 250bp. All designed primer pairs were optimised before obtaining the best annealing temperature of the amplification process.

Molecular characterisation using newly developed microsatellite markers

A total of 14 novel SSR markers () were chosen to characterise the thirteen individuals representing 9 different cultivars. The polymerase chain reaction (PCR) was carried out as proposed by Schuelke [Citation15] by ligating the primers (either forward or reverse) and fluorescent dye (FAM, PET, NED or VIC) with a non-fluorescent labelled M13 sequence tail [M13 sequence: TGTAAAACGACGGCCAGT]. The mixture of the PCR reaction was prepared in order to achieve a 10μL target volume, consisting of 1x buffer (Invitrogen, United States), 10μmol/L of each forward and reverse primer, 5μmol/L fluorescence-labelled M13 adaptor, 2μmol/L of each deoxyribonucleside triphosphate (dNTP) (Invitrogen, United States), 0.1μL of bovine serum albumin (BSA) as PCR enhancer and 1U of Taq polymerase (Invitrogen, United States). Amplification was performed utilising a GeneAmp® PCR System 9700 (Applied Biosystems, United States). The PCR profile consisted of pre-denaturation for 2min at 94 °C, followed by 35 cycles at 94 °C for 30s, 41–65 °C for 45s, and 72 °C for 45s, and followed lastly by post-extension at 72 °C for 5min.

Table 1. List of SSR markers used to characterise taro cultivars.

Following amplification, the PCR product was then multiplexed up to four primers with a different combination of fluorescent dyes. This was followed by mixing the products with Hi-Di formamide and GeneScan 500 LIZ (standard molecular weight ladder) prior to being resolved using an ABI 3130xL Genetic Analyser (Applied Biosystems, United States). GeneMapper Version 5 (Thermo Fisher Scientific, United States) was used to score the size of the allele. The produced electropherograms were accordingly scored and analysed, as described by Arif et al. [Citation16].

To calculate the number of alleles, the Major allele Frequency, Gene Diversity, the extent of heterozygosity, and Polymorphism Information Content (PIC) of each microsatellite marker, the PowerMarker software was used. The same software was used to calculate the shared allele based genetic-distance [Citation17]. To visualise the dendrogram, MEGA7 was used founded on the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), constructed by applying the produced matrices genetic distance of the shared microsatellite alleles [Citation18].

Results

RAD-seq sequencing and microsatellite analysis

In this study, a total of 8.49 million clean reads were produced. Using Velvet, these clean reads were then assembled into 222,445 contigs. A total of 3,280 contigs were found to contain a total of 3,454 repeats (). From the 3,454 repeats, 2,760 (79.9%) were dinucleotide repeat motifs, 631 (18.3%) trinucleotide repeat motifs, 40 (1.2%) tetranucleotide repeat motifs, 15 (0.4%) pentanucleotide repeat motifs and 8 (0.2%) hexanucleotide repeat motifs (). The most abundant dinucleotide repeat motifs were CT having 707 repeats, and the least number of dinucleotide repeat motifs was for CG. A total of six types of dinucleotide repeat motifs (CT, AG, AT, AC, GT and CG) were found in this study. For the trinucleotide repeat motifs, a total of 20 types of trinucleotide repeat motifs were found, where the AAG repeat motif was the most abundant trinucleotide repeat motif found having a total of 103 repeats (16.3%), and the least repeated motif among the trinucleotides was ACG with only one repeat identified. The details of the di- and trinucleotide repeat motifs are summarised in . Out of all the repeats found, only 1790 were suscessfully designed with their flanking primer. Further, fourteen microsatellites were randomly selected from 1790 microsatellites for characterisation of the Malaysian cultivated taro.

Table 2. Statistical summary of RAD sequencing data for microsatellites.

Table 3. The abundance of repeat motifs type found in this study.

Table 4. Abundance of the motif.

Molecular characterisation of Malaysian cultivated Taro

The analysis of 14 novel SSR markers in this study found that there were only twelve polymorphic SSR markers, with the remaining two SSR markers, CesSSR9 and CesSSR11, being monomorphic. Accordingly, these two markers were discarded from the analysis. The analysis of the polymorphic SSR markers revealed the presence of 50 alleles that ranged between two (CesSSR5) and seven (CesSSR13) with an average of 4.17 alleles per locus, whereas, the number of polymorphic information content ranged between 0.05 (CesSSR5) and 0.59 (CesSSR7) with an average of 0.33. Similarly, the value of gene diversity and heterozygosity ranged between 0.06 (CesSSR5) and 0.00 (CesSSR4 and CesSSR5) to 0.63 (CesSSR6) and 0.68 (CesSSR6), respectively. The details of SSR characterisation are summarised in .

Table 5. Characterisation of novel microsatellite markers.

The UPGMA dendrogram () showed that only cultivar MINYAK was unable to separate from cultivar PUTIH. Moreover, the dendrogram showed no variation within cultivar PUTIH. Also shown was an intra-variety variation of cultivars WANGI A1 and WANGI B4. The calculated pairwise genetic distance based on shared alleles showed that the value within the cultivar Putih was 0.00, and the value between cultivar PUTIH and cultivar MINYAK was 0.00. The highest pairwise genetic distance was observed between cultivar TELUR and cultivar WANGI A1 with the value of 0.75 (refer to ).

Figure 1. UPGMA dendrogram generated based on shared allele genetic distance of studied cultivar.

Figure 1. UPGMA dendrogram generated based on shared allele genetic distance of studied cultivar.

Table 6. Pairwise genetic distance of studied cultivars based on 12 novel microsatellite marker.

Discussion

Conventional vs. NGS based microsatellites development

Enhancements in the NGS platform have provided a speedy and financially smart method for microsatellite discovery and progression using paired-end RAD-seq which has subsequently created longer assembly sequences for improving the rates of microsatellite identification. The method has also proficiently and effectively produced a vast series of polymorphic microsatellite markers which can be used for a broad range of research for non-model or understudied species from a hundred to thousands of microsatellites in a responsive and cost-effective way [Citation19–21].

In contrast to the more conventional hybridisation approach for the development of microsatellite markers, the RAD-seq approach, along with de novo sequencing is more efficient and effective considering both time and money. Moreover, the conventional hybridisation approach requires between two and four weeks to complete the technical work, acquiring a restricted number of microsatellite markers. On the other hand, the embraced approach only requires between one and two weeks to obtain a vast number of microsatellite markers [Citation22].

Customary, microsatellite development methods tend to create a sequence from one specimen and fail to consider the existence of repetitive components and insertion-deletion mutations that flank the candidate locus, which can impact the success of both genotyping and precision. Additionally, the conventional hybridisation approach for microsatellite identification and development typically only creates a sequence from a single individual and fails to consider the presence of repetitive components and the insertion-deletion of mutations that flank the target locus thus influencing genotyping success and precision.

On the other hand, there are a vast number of studies on microsatellite development using NGS [Citation23–25]. In a study by Yang et al. [Citation25], they detected the loci of 650 microsatellites resulting from 4.5 million raw RAD reads, with only 285 (43.84%) primer pairs that were designed successfully. However, in the present study, 3,285 microsatellites were found in 222,445 assembled sequences, with 1,790 primer pairs that were designed successfully. Regarding the conventional selective hybridisation approach, according to Badri et al. [Citation26], they only succeeded in obtaining 350 white colonies, with only 52 shown to have unique SSR regions for sesame. Paliwal et al. [Citation27] also applied the conventional approach to creating microsatellite markers, obtaining 356 white colonies, and 114 (32.02%) colonies with SSR repeats. Therefore, this demonstrates the RAD-seq approach for microsatellite development to be more useful and efficient regarding time, cost and labour.

Genetic conservation and cultivar identification

The conservation of taro and its genetic assets requires identification, characterisation and evaluation of the genotypes for the efficient and practical application in breeding activities and acceptable conservation management programme. The methodologies and techniques such as biochemical, morphological, and molecular-based identification of different plant species have been applied worldwide, in which molecular markers have proven to be the best, if not the preferred approach, as the marker system towards recognising the features explicit to plants, which can be recorded with both confidence and precision.

Conventionally, the characterisation of taro depends on the morphological characterisation of its leaves, flowers, fruit, seeds and any other traits. However, even though the utilisation of morphological characterisation for the cultivar identification remains significant, the sole utilisation of morphological characteristics often results in incorrect and conflicting outcomes. These outcomes are attributable to ecological and phenological factors, often limited by the number of different characteristics. As a result, molecular markers offer an increasingly reliable option to that of morphological markers. As in this study, we are able to detect intra-variety variation between WANGI A1 and WANGI B4 by using microsatellite markers, which morphologically we assume both are WANGI cultivars.

Furthermore, despite the expanding ubiquity of various SNP genetic studies, microsatellites remain one of the most informative and flexible markers accessible to various genetic domains. High throughput microsatellite markers are extensively used in plant genomic research given their exceptionally polymorphic, multi-allelic nature, simple logical system and transferability over genotypes [Citation7, Citation28]. As a result, microsatellite markers are seen as a valued tool for the identification of taro cultivars, providing crucial, if not essential data for sustaining future biodiversity preservation and gene bank management.

Author contributions

S.A.R.: designed the whole project, analysed the data and wrote the manuscript. M.N.G.: sampling and morphological characterization and identification. N.H.E.N.A. and A.M.A.M.: performed the experiment. S.N.I.: scored the alleles and reviewed the manuscript. All authors have read and approved the manuscript.

Acknowledgement

The authors would like to acknowledge anyone who indirectly involved in this project.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability

All data that support the findings reported in this study are available from the corresponding author upon reasonable request.

Additional information

Funding

This research was funded by Malaysian government under NKEA EPP14 project.

References

  • Jianchu X, Yongping Y, Yingdong P, et al. Genetic diversity in taro (Colocasia esculenta Schott, Araceae) in China: an ethnobotanical and genetic approach. Econ Bot. 2001;55(1):14–31.
  • Matthews P. A possible tropical wildtype taro: Colocasia esculenta var. aquatilis. BIPPA. 1991;11(0):69–81.
  • Kreike CM, Van Eck HJ, Lebot V. Genetic diversity of taro, Colocasia esculenta (L.) Schott, in Southeast Asia and the Pacific. Theor Appl Genet. 2004;109(4):761–768.
  • Hassani SMR, Talebi R, Pourdad SS, et al. Morphological description, genetic diversity and population structure of safflower (carthamus tinctorius L.) mini core collection using SRAP and SSR markers. Biotechnol Biotechnol Equip. 2020;34(1):1043–1055.
  • Vukosavljev M, Esselink GD, van ‘t Westende WPC, et al. Efficient development of highly polymorphic microsatellite markers based on polymorphic repeats in transcriptome sequences of multiple individuals. Mol Ecol Resour. 2015;15(1):17–27.
  • Oliveira EJ, Gomes Pádua J, Zucchi MI, et al. Origin, evolution and genome distribution of microsatellites. Genet Mol Biol. 2006;29(2):294–307.
  • Ab Razak S, Radzuan SM, Mohamed N, et al. Development of novel microsatellite markers using RAD sequencing technology for diversity assessment of rambutan (Nephelium lappaceum L.) germplasm. Heliyon. 2020;6(9):e05077.
  • Zalapa JE, Cuevas H, Zhu H, et al. Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. Am J Bot. 2012;99(2):193–208.
  • McCormack JE, Maley JM, Hird SM, et al. Applications of next-generation sequencing to phylogeography and phylogenetics. Mol Phylogenet Evol. 2012;62(1):397–406.
  • Miller MR, Dunham JP, Amores A, et al. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007;17(2):240–248.
  • Wang JY, Yan SY, Hui WK, et al. SNP discovery for genetic diversity and population structure analysis coupled with restriction-associated DNA (RAD) sequencing in walnut cultivars of Sichuan Province. China. Biotechnol. Biotechnol. Equip. 2020;34(1):652–664.
  • Catchen J, Hohenlohe PA, Bassham S, et al. Stacks: an analysis tool set for population genomics. Mol Ecol. 2013;22(11):3124–3140.
  • Namiki T, Hachiya T, Tanaka H, et al. MetaVelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40(20):e155.
  • Beier S, Thiel T, Münch T, et al. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33(16):2583–2585.
  • Schuelke M. An economic method for the fluorescent labeling of PCR fragments. Nat Biotechnol. 2000;18(2):233–234.
  • Arif IA, Khan HA, Shobrak M, et al. Interpretation of electrophoretograms of seven microsatellite loci to determine the genetic diversity of the Arabian Oryx. Genet Mol Res. 2010;9(1):259–265.
  • Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics. 2005;21(9):2128–2129.
  • Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–1874.
  • Barchi L, Lanteri S, Portis E, et al. Identification of SNP and SSR markers in eggplant using RAD tag sequencing. BMC Genomics. 2011;12:304.
  • Davey JW, Hohenlohe PA, Etter PD, et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet. 2011;12(7):499–510.
  • Tian Z, Zhang F, Liu H, et al. Development of SSR markers for a Tibetan medicinal plant, Lancea tibetica (Phrymaceae), based on RAD sequencing. Appl. Plant. Sci. 2016;4:1600076.
  • Zane L, Bargelloni L, Patarnello T. Strategies for microsatellite isolation: a review. Mol Ecol. 2002;11(1):1–6.
  • Bonatelli IA, Carstens BC, Moraes EM. Using next generation RAD sequencing to isolate multispecies microsatellites for Pilosocereus (Cactaceae). PloS One. 2015;10(11):e0142602.
  • Minegishi Y, Ikeda M, Kijima A. Novel microsatellite marker development from the unassembled genome sequence data of the marbled flounder pseudopleuronectes yokohamae. Mar Genomics. 2015;24 Pt 3:357–361.
  • Yang GQ, Chen YM, Wang JP, et al. Development of a universal and simplified ddRAD library preparation approach for SNP discovery and genotyping in angiosperm plants. Plant Methods. 2016;12:39.
  • Badri J, Yepuri V, Ghanta A, et al. Development of microsatellite markers in sesame (Sesamum indicum L.). Turk J Agric For. 2014;38:603–614.
  • Paliwal R, Kumar R, Choudhury DR, et al. Development of genomic simple sequence repeats (g-SSR) markers in Tinospora cordifolia and their application in diversity analyses. Plant Gene. 2016;5:118–125.
  • Vieira ML, Santini L, Diniz AL, et al. Microsatellite markers: what they mean and why they are so useful. Genet Mol Biol. 2016;39(3):312–328.