961
Views
1
CrossRef citations to date
0
Altmetric
Silviculture and Plant Sciences

Haplotype-resolved de novo genome assemblies of four coniferous tree species

ORCID Icon, , , , , ORCID Icon, , ORCID Icon & ORCID Icon show all
Pages 151-157 | Received 05 Jun 2023, Accepted 01 Oct 2023, Published online: 16 Oct 2023

ABSTRACT

Coniferous trees in gymnosperm are an important source of wood production. Because of their long lifecycle, the breeding programs of coniferous tree are time- and labor-consuming. Genomics could accelerate the selection of superior trees or clones in the breeding programs; however, the genomes of coniferous trees are generally giant in size and exhibit high heterozygosity. Therefore, the generation of long contiguous genome assemblies of coniferous species has been difficult. In this study, we employed high-fidelity (HiFi) long-read sequencing technology to sequence and assemble the genomes of four coniferous tree species, Larix kaempferi, Chamaecyparis obtusa, Cryptomeria japonica, and Cunninghamia lanceolata. Genome assemblies of the four species totaled 13.5 Gb (L. kaempferi), 8.5 Gb (C. obtusa), 9.2 Gb (C. japonica), and 11.7 Gb (C. lanceolata), which covered 99.6% of the estimated genome sizes on average. The contig N50 value, which indicates assembly contiguity, ranged from 1.2 Mb in C. obtusa to 16.0 Mb in L. kaempferi, and the assembled sequences contained, on average, 89.2% of the single-copy orthologs conserved in embryophytes. Assembled sequences representing alternative haplotypes covered 70.3–95.1% of the genomes, suggesting that the four coniferous tree genomes exhibit high heterozygosity levels. The genome sequence information obtained in this study represents a milestone in tree genetics and genomics, and will facilitate gene discovery, allele mining, phylogenetics, and evolutionary studies in coniferous trees, and accelerate forest tree breeding programs.

Introduction

Forests cover approximately 31% of the global land area (FAO Citation2020). In Japan, forest area is as high as more than 60%, of which approximately 40% is accounted for by artificial forests, mainly coniferous trees, which have been planted for applications in the forestry industry (Forestry Agency Citation2022). Cryptomeria japonica, a Cupressaceae family member endemic to Japan, is the most important forestry species in the country, occupying > 40% of the artificial forests (Forestry Agency Citation2022). Chamaecyparis obtusa, another member of the Cupressaceae family, is also an important conifer species with superior-quality wood that has been used for the construction of buildings in Japan since the ancient times (Tsumura et al. Citation2007). Cunninghamia lanceolata (Cupressaceae) was introduced from China and Taiwan, and has been recognized as a new woody resource in Japan in recent years because of its fast growth and desirable wood properties (Fujisawa Citation2017). Additionally, Larix kaempferi (Pinaceae) is an important breeding tree species in the northern part of Japan (Kurinobu Citation2005). Timber tree germplasm and breeding programs have been used to improve wood production and quality (Takahashi et al. Citation2023). However, the breeding programs of coniferous trees are generally time-consuming because their lifecycle can be as long as or longer than 50 years. Furthermore, the highly heterozygous genomes and outbreeding mating system of coniferous tree species complicate the breeding systems (Burdon and Wilcox Citation2011).

Since the genomes of coniferous trees are often giant in size (>10 Gb), the analysis of the genome sequence data of these species remains challenging (Neale and Wheeler Citation2019). Owing to the recent advancements in long-read sequencing technologies, the genomes of gymnosperm species belonging to six families, including Cupressaceae, Cycadaceae, Ginkgoaceae, Pinaceae, Taxaceae, and Welwitschiaceae, have become available (Wan et al. Citation2022). The giant genomes of gymnosperm species are rich in repetitive sequences such as transposable elements (Ohri Citation2021). The high-fidelity (HiFi) long-read sequencing technology generates reads spanning the repetitive sequences, resulting in an assembly that covers most of the gene spaces in the genome (Hon et al. Citation2020).

Coniferous tree genomics using the greatly advanced sequencing technologies promises the acceleration of not only the breeding programs but also the phylogenetic study of coniferous species. In this study, we determined the genome sequences of four coniferous tree species, including L. kaempferi, C. obtusa, C. japonica, and C lanceolata, using the HiFi long-read sequencing technology. The genome sequence information obtained in this study could serve as a useful resource for breeding coniferous trees, and for understanding the physiology of forest trees and the population genetics and phylogenetics of gymnosperms.

Materials and methods

Plant materials and DNA extraction

Four coniferous tree species were used in this study: Larix kaempferi (GFE32203) was planted at the Tohoku Regional Breeding Office, Forest Tree Breeding Center, Forestry and Forest Products Research Institute, Forest Research and Management Organization in Iwate, Japan; Chamaecyparis obtusa (GFB00119) and Cryptomeria japonica (GFA01029) were planted at the Forest Tree Breeding Center, Forestry and Forest Products Research Institute, Forest Research and Management Organization in Ibaraki, Japan; and Cunninghamia lanceolata (GFHN00090) was also planted at Forest Tree Breeding Center, Forestry and Forest Products Research Institute, Forest Research and Management Organization in Ibaraki, Japan. The original C. lanceolata tree was planted at the Kiyosumi Work Station in the University of Tokyo Chiba Forest in Chiba, Japan.

Genomic DNA was extracted from the young leaves of each tree species using Genomic-tips (Qiagen, Hilden, Germany). DNA concentration was measured using the Qubit dsDNA BR assay kit (Thermo Fisher Scientific, Waltham, MA, U.S.A.), and DNA fragment length was evaluated by agarose gel electrophoresis with Pippin Pulse (Sage Science, Beverly, MA, U.S.A.).

DNA library preparation and sequencing

Long-insert DNA libraries for PacBio HiFi sequencing were prepared using six protocols (). Genomic DNA was sheared either by six centrifugations at 1,600 × g in the g-Tube (Covaris, Woburn, MA, U.S.A.) or in Megaruptor 2 (Deagenode, Liege, Belgium) with the Large Fragment Hydropore mode and mean fragment sizes of 20, 30, or 40 kb. Then, DNA library construction was performed with the SMRTbell Express Template Prep Kit 2.0 (PacBio), according to the manufacturer’s instructions. The obtained DNA libraries were fractionated with BluePippin (Sage Science) to eliminate fragments shorter than 15, 20, or 25 kb in length. The fractionated DNA libraries were sequenced on SMRT cells on the Sequel II and Sequel IIe system (PacBio). HiFi reads were constructed with the CCS pipeline (https://ccs.how).

Table 1. HiFi library preparation protocols used in this study.

Short-insert libraries were prepared with a PCR-free Swift 2S Turbo Flexible DNA Library Kit (Ann Arbor, MI, U.S.A.) and converted into a DNA nanoball sequencing library with an MGI Easy Universal Library Conversion Kit (MGI Tech, Shenzhen, China). The library was sequenced on a DNBSEQ G400RS (MGI Tech) instrument in paired-end, 101-bp mode.

Genome sequence assembly

Genome sizes of the four tree species were estimated with Genomic Character Estimator (GCE) (Liu et al. Citation2013), based on the k-mer frequency (k = 21) calculated from the HiFi reads with Jellyfish version 2.3.0 (Marçais and Kingsford Citation2011). The reads were assembled using Hifiasm version 0.16.1 (Cheng et al. Citation2021) with default parameters. Assembly completeness was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) version 5.2.2, with default parameters (Simão et al. Citation2015), using the lineage dataset embryophyta_odb10 (eukaryota, 10 September 2020). To assess the quality of the four genome assemblies, the short reads obtained from DNBSEQ G400RS (MGI Tech) were aligned on the genome sequences of the four species with Bowtie 2 (version 2.3.5.1 with the default parameters) (Langmead and Salzberg Citation2012).

Gene sequence alignment analysis

Publicly available gene sequences were obtained from GenBank DNA database (Benson et al. Citation2013). The accession numbers for the sequences were ICRN01000001–ICRN01050690 for L. kaempferi (Mishima et al. Citation2022), BW986416–BW992245 for C. obtusa (Ujino-Ihara et al. Citation2005), IABU01000001–IABU01034731 for C. japonica (Mishima et al. Citation2018), and JU981479–JV043149 for C. lanceolata (Huang et al. Citation2012). These sequences were aligned on each assembly with Minimap2 (version 2.24 with the parameter of -x splice) (Li and Birol Citation2018). Unmapped sequences were identified by a sequence similarity search to non-redundant proteins using DIAMOND (Buchfink et al. Citation2021) with an E-value cutoff of < 1E–10 and taxonomy was classified in accordance with the definition of GenBank DNA database.

Results

Long-read DNA sequencing

We extracted genome DNA from L. kaempferi, C. obtusa, C. japonica, and C lanceolata and confirmed that the extracted DNA sizes were more than 40 kb (data not shown). To maximize data production efficiency through genome sequencing of the four trees, we optimized DNA library preparation protocols with six conditions (Protocol1–Protocol6), with slight modification to the sheared DNA fragment sizes and cutoff values to eliminate short DNA in libraries (). The resultant libraries were sequenced on a total of 38 SMRT cells: 2 cells, Protocol1; 23 cells, Protocol2; 1 cell, Protocol3; 3 cells, Protocol4; 2 cells, Protocol5; and 7 cells, Protocol6.

The number, N50 value, and total length of HiFi reads per cell varied among the protocols (). The total length of HiFi reads per cell ranged, on average, from 23.2 Gb in Protocol2 to 35.8 Gb in Protocol6, while the read N50 value of these reads ranged, on average, from 16.3 kb in Protocol4 to 27.8 kb in Protocol6. The read N50 was expectedly long in long-insert libraries prepared using Protocol3 and Protocol6.

Figure 1. Total length, N50 value, and number of HiFi reads obtained from DNA libraries prepared using six different protocols. The different symbols (square, circle, triangle, plus, cross, and diamond) indicate the different protocols (1, 2, 3, 4, 5, and 6, respectively; detailed in table 1), and different colors (red, blue, gray, and black) indicate the different tree species (Larix kaempferi, Chamaecyparis obtusa, Cryptomeria japonica, and Cunninghamia lanceolata, respectively).

Figure 1. Total length, N50 value, and number of HiFi reads obtained from DNA libraries prepared using six different protocols. The different symbols (square, circle, triangle, plus, cross, and diamond) indicate the different protocols (1, 2, 3, 4, 5, and 6, respectively; detailed in table 1), and different colors (red, blue, gray, and black) indicate the different tree species (Larix kaempferi, Chamaecyparis obtusa, Cryptomeria japonica, and Cunninghamia lanceolata, respectively).

Genome assembly of L. kaempferi

A total of 251.4 Gb HiFi reads (median N50 = 20.9 kb) were obtained from 11 SMRT cells (). In the k-mer distribution analysis, two peaks were detected at k-mer multiplicities of 10 and 19 (); the former and latter multiplicities corresponded to heterozygous regions of the diploid genome and homozygous regions of haploid genome, respectively. The haploid genome size of L. kaempferi was estimated to be 13.2 Gb (). Accordingly, the genome coverage of the HiFi reads was calculated as 19× ( = 251.4 Gb/13.2 Gb), which was sufficient for de novo genome assembly. The reads were assembled with a haplotype-resolved de novo assembly method to obtain primary contigs (long continuous stretches of contiguous sequences from one haplotype) and alternate contigs (contiguous sequences including sequence and structural variants from another haplotype). The primary contigs included 4,655 sequences spanning 13.5 Gb, with an N50 value of 16.0 Mb, which corresponded to the estimated size (). The complete BUSCO score was 89.1% (single-copy BUSCO score = 77.5%, duplicated BUSCO score = 11.6%) (). A map rate of the short reads was 98.8% to cover 99.8% of the primary contig length. On the other hand, the total length of alternate contigs was 9.5 Gb (N50 = 1.7 Mb) (). The ratio of the length of alternative contigs to that of primary contigs was 70.3% ( = 9.5 Gb/13.5 Gb).

Figure 2. Estimation of the genome sizes of L. kaempferi, C. obtusa, C. japonica, and C. lanceolata by k-mer distribution analysis. The k-mer size of 21 was used for the four species.

Figure 2. Estimation of the genome sizes of L. kaempferi, C. obtusa, C. japonica, and C. lanceolata by k-mer distribution analysis. The k-mer size of 21 was used for the four species.

Table 2. Genome assembly statistics of four timber tree species.

Genome assembly of C. obtusa

A total of 254.5 Gb HiFi reads (median N50 = 23.0 kb) were obtained from 10 SMRT cells (). Two peaks were detected at k-mer multiplicities of 16 (diploid) and 32 (haploid), and the genome size of C. obtusa was estimated to be 7.9 Gb (). The genome coverage of HiFi reads was 30× ( = 254.5 Gb/7.9 Gb). The HiFi reads were assembled into 15,466 primary contigs spanning 8.5 Gb, with an N50 value of 1.2 Mb (). The total contig length corresponded to the estimated genome size. The complete BUSCO score was 89.4% (single-copy BUSCO = 82.3%, and duplicated BUSCO = 7.1%) (). A map rate of the short reads was 99.0% to cover 99.7% of the primary contig length. The alternate contig length was 7.8 Gb, with an N50 value of 0.5 Mb (). The ratio of alternate contig length to primary contig length was 91.3% ( = 7.8 Gb/8.5 Gb).

Genome assembly of C. japonica

A total of 237.6 Gb HiFi reads (median N50 = 17.3 kb) were obtained from nine SMRT cells (). Two peaks were detected at k-mer multiplicities of 12 (diploid) and 24 (haploid), and the genome size of C. japonica was estimated to be 9.9 Gb (). The genome coverage of HiFi reads was 24× ( = 237.6 Gb/9.9 Gb). The HiFi reads were assembled into 2,741 primary contigs spanning 9.2 Gb, with an N50 value of 8.3 Mb (). The total contig length covered 93.4% of the estimated genome size. The complete BUSCO score was 89.0% (single-copy BUSCO = 82.3%, duplicated BUSCO = 6.7%) (). A map rate of the short reads was 98.6% to cover 99.9% of the primary contig length. The alternate contig length was 8.7 Gb, with an N50 value of 2.1 Mb (). The ratio of alternate contig length to primary contig length was 94.3% ( = 8.7 Gb/9.2 Gb).

Genome assembly of C. lanceolata

A total of 289.1 Gb HiFi reads (median N50 = 27.8 kb) were obtained from eight SMRT cells (). Two peaks were detected at k-mer multiplicities of 12 (diploid) and 24 (haploid), and the genome size of C. lanceolata was estimated to be 12.0 Gb (). The genome coverage of HiFi reads was 24× ( = 289.1 Gb/12.0 Gb). The HiFi reads were assembled into 2,472 primary contigs spanning 11.5 Gb, with an N50 value of 11.7 Mb (). The total contig length covered 95.9% of the estimated genome size. The complete BUSCO score was 89.1% (single-copy BUSCO = 80.8%, duplicated BUSCO = 8.3%) (). A map rate of the short reads was 98.2% to cover 99.9% of the primary contig length. The alternate contig length was 11.0 Gb, with an N50 value of 2.9 Mb (). The ratio of alternate contig length to primary contig length was 95.1% ( = 8.7 Gb/9.2 Gb).

Gene sequence alignment on the assemblies

To gain information on gene positions on the genome assemblies, we searched a public database to obtain 50,690, 5,830, 34731, and 61,670 gene sequences for L. kaempferi, C. obtusa, C. japonica, and C. lanceolata, respectively. Of the sequences 40,628 L. kaempferi (80.1%), 5,814 C. obtusa (99.7%), 34468 C. japonica (99.2%), and 53,750 C. lanceolata (87.2%) sequences were aligned on the genomes. Totals of 6,346 (12.5%) and 5,210 (8.5%) unmapped sequences in L. kaempferi and C. lanceolata showed similarities to protein sequences of Ascomycota and Mucoromycota, respectively, followed by unreported proteins (3.4% in L. kaempferi and 2.0% in C. lanceolata) (Supplementary Table S1).

Discussion

Here, we report the genome assemblies of four gymnosperm coniferous tree species, L. kaempferi, C. obtusa, C. japonica, and C. lanceolata (). In the four species, the assembly sizes were compared with the estimated sizes () and most of short reads were fully covered the assemblies. This indicated that the genome assemblies obtained would probably represent the entire genome sequences. Furthermore, gene sequences excluding potential contaminated sequences from fungi were almost mapped on the assembly sequences (Supplementary Table S1). The genome assemblies from this study would possess enough quality for genomics and genetics in the four gymnosperm coniferous tree species, L. kaempferi, C. obtusa, C. japonica, and C. lanceolata.

Since the genomes of these species were giant in size (~10 Gb), similar to those of other gymnosperm species, we have been optimizing the DNA library preparation protocols to maximize the data production efficiency. Among the six protocols (), Protocol6 was the most effective with respect to the data yield (median 35.8 Gb) and read N50 length (median 27.8 kb), although the number of reads obtained using Protocol6 was less than that obtained using Protocol1, Protocol4, and Protocol5 (). Finally, we found that, as in Protocol6, genome DNA should be sheared with Megaruptor 2 and that shearing condition and library cutoff should be as long as possible if high-molecular weight genome DNA (>40 kb) was extracted. We concluded that Protocol6 might save time and costs for forthcoming genome sequence analysis for the giant genomes, even though it was still unclear the relationship between protocols and target species.

In addition, the genome coverage of HiFi reads differed among the four species. In C. lanceolata, HiFi read N50 length was the highest (27.8 kb; ), and genome coverage was the second highest (24×) among the four species. On the other hand, the HiFi read N50 length of L. kaempferi (20.9 kb) was lower than that of C. lanceolata (), and genome coverage (19×) in L. kaempferi was the lowest among the four species. Unexpectedly, L. kaempferi showed the longest sequence contiguity, followed by C. lanceolata (). The heterozygosity level of genomes might affect the assembled sequence contiguity. The ratio of alternate contig length to primary contig length (70.3% in L. kaempferi and 95.1% in C. lanceolata) might support this assumption. However, the C. obtusa assembly was remarkably fragmented, even though its HiFi read N50 (23.0 kb) () and genome coverage (30×) were comparable with those of C. lanceolata. This suggests that not only the heterozygosity level but also other genome features, such as the length and/or distribution patterns of repetitive elements, affect the contiguity of the assembled sequence.

In the phylogenetic tree of seed plants, gymnosperms occupy a basal position (Chase et al. Citation1993) that branches out into the different clades of angiosperms. Therefore, genetic and genomic studies on gymnosperms could shed light on the evolutionary history of plants. However, the genomic investigation of gymnosperms has been lagging behind that of angiosperms because gymnosperms possess large-sized genomes (≥10 Gb) and are relatively less commercially important than angiosperms (Wan et al. Citation2022). The advent of sequencing technologies has steered this situation in favor of gymnosperm genomics. Long-read sequencing technologies have enabled the sequencing and assembly of giant gymnosperm genomes rich in repetitive sequences, which was not possible with short-read sequencing technologies (Wan et al. Citation2022). The genome assemblies of four gymnosperm species generated in this study, in addition to those of other species sequenced previously (Wan et al. Citation2022), will provide new insights into the genome evolution of plants.

The four coniferous species sequenced in this study are important for the forestry. Genome assemblies of these four coniferous tree species could be used as references for the identification of sequence and structural variants in the genomes of divergent cultivars and breeding materials belonging to the same species. Furthermore, in previous genetic and genomic studies, the utilization of a genome sequence as a reference also enabled the identification of genes of interest (Neale and Kremer Citation2011). The variant and gene information obtained in this study could be used for the development of DNA markers to facilitate genetic studies and breeding programs (Muranty et al. Citation2014). Genome prediction might be another powerful tool for the selection of elite tree lines from breeding programs (Lebedev et al. Citation2020; Grattapaglia Citation2022), which usually require a long time. In addition, gene editing technology could also be used as an effective breeding strategy for shortening the duration of tree breeding programs (Bewg et al. Citation2018; Goralogia et al. Citation2021).

The genome sequence information obtained in this study could contribute to breeding programs, gene discovery, and allele mining in coniferous tree species with giant genomes. Since the coverage of genome sequences was as high as ~ 90%, according to BUSCO evaluation, the assemblies could be used as reference sequences in transcriptome analysis (Mishima et al. Citation2022). The protocols presented in this study would contribute to and accelerate the genome sequence analysis of coniferous species with giant genomes. Moreover, chromosome-level assemblies, which would enable phylogenetics and evolutionary studies based on comparative genomics, and subsequent comprehensive transcriptome analysis could also be established through further genomic and genetic analyses in the near future.

Data archiving statement

Raw sequence reads were deposited in the Sequence Read Archive (SRA) database of the DNA Data Bank of Japan (DDBJ) under the BioProject of PRJDB14450 and the accession numbers DRR413190 - DRR413200 and DRR459944 (L. kaempferi), DRR413180 - DRR413189 and DRR459943 (C. obtusa), DRR413201 - DRR413209 and DRR459945 (C. japonica), and DRR413210 - DRR413217 and DRR459946 (C. lanceolata). The assembled sequences are available at DDBJ (accession numbers: BSBM01000001-BSBM01004655 [L. kaempferi]; BSBK01000001-BSBK01015466 [C. obtusa]; BSBL01000001-BSBL01002741 [C. japonica]; and BSBN01000001-BSBN01002472 [C. lanceolata]), BreedingTrees-by-Genes (http://btg.kazusa.or.jp), and Plant GARDEN (https://plantgarden.jp).

Supplemental material

Supplemental Material

Download MS Excel (13.1 KB)

Acknowledgements

We thank the University of Tokyo Chiba Forest for their support and cooperation during the collection of C. lanceolata leaf samples. We also thank Y. Kishida, M. Kohara, C. Minami, K. Ozawa, H. Tsuruoka, and A. Watanabe (Kazusa DNA Research Institute) for technical assistance. This study was supported in part by the MAFF commissioned project study on “Development of efficient breeding technique aiming at forestry trees with superior carbon storage capacity” (Grant Number JPJ009841), JSPS KAKENHI (22H05172 and 22H05181), and the Kazusa DNA Research Institute Foundation.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/13416979.2023.2267304

Additional information

Funding

The work was supported by the KAKENHI [22H05172]; KAKENHI [22H05181]; MAFF [JPJ009841].

References

  • Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2013. GenBank. Nucleic Acids Research. 41(D1):D36–42. doi: 10.1093/nar/gks1195.
  • Bewg WP, Ci D, Tsai C-J. 2018. Genome editing in trees: from multiple repair pathways to long-term stability. Front Plant Sci. 9:1732. doi: 10.3389/fpls.2018.01732.
  • Buchfink B, Reuter K, Drost H-G. 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 18(4):366–368. doi: 10.1038/s41592-021-01101-x.
  • Burdon RD, Wilcox PL. 2011. Integration of molecular markers in breeding. In: Plomion C, Bousquet J, and Kole C, editors. Genetics, genomics and breeding of conifers. Boca Raton, FL, USA: CRC Press; p. 47.
  • Chase MW, Soltis DE, Olmstead RG, Morgan D, Les DH, Mishler BD, Duvall MR, Price RA, Hills HG, Qiu Y-L, et al. 1993. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals Of The Missouri Botanical Garden. 80(3):528–580. doi:10.2307/2399846.
  • Cheng H, Concepcion GT, Feng X, Zhang H, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18(2):170–175. doi: 10.1038/s41592-020-01056-5.
  • FAO. 2020. Global forest resources assessment 2020. Rome, Italy: FAO.
  • Forestry Agency. 2022. Annual report on Forest and Forestry in Japan. In: Annual Report Group, editor. Ministry of agriculture, forestry and fisheries. Japan; p. 1–50.
  • Fujisawa Y. 2017. The future of forestry and Chinese fir. Forest Genet Tree Breed. 6:132–136.
  • Goralogia GS, Redick TP, Strauss SH. 2021. Gene editing in tree and clonal crops: progress and challenges. In Vitro Cell Dev Biol -Plant. 57(4):683–699. doi: 10.1007/s11627-021-10197-x.
  • Grattapaglia D. 2022. Twelve years into genomic selection in Forest trees: climbing the slope of enlightenment of marker assisted tree breeding. Forests. 13(10):1554. doi: 10.3390/f13101554.
  • Hon T, Mars K, Young G, Tsai Y-C, Karalius JW, Landolin JM, Maurer N, Kudrna D, Hardigan MA, Steiner CC, et al. 2020. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 7(1):1–11. doi:10.1038/s41597-020-00743-4.
  • Huang H-H, Xu L-L, Tong Z-K, Lin E-P, Liu Q-P, Cheng L-J, Zhu M-Y. 2012. De Novo characterization of the Chinese fir (Cunninghamia lanceolata) transcriptome and analysis of candidate genes involved in cellulose and lignin biosynthesis. Bmc Genom. 13(1):648. doi: 10.1186/1471-2164-13-648.
  • Kurinobu S. 2005. Forest tree breeding for Japanese larch. Eurasian J For Res. 8:127–137.
  • Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with bowtie 2. Nat Methods. 9(4):357–359. doi: 10.1038/nmeth.1923.
  • Lebedev VG, Lebedeva TN, Chernodubov AI, Shestibratov KA. 2020. Genomic selection for Forest tree improvement: methods, achievements and perspectives. Forests. 11(11):1190. doi: 10.3390/f11111190.
  • Li H, Birol I. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34(18):3094–3100. doi: 10.1093/bioinformatics/bty191.
  • Liu B, Shi Y, Yuan J, Hu X, Zhang H, Li N, Li Z, Chen Y, Mu D, Fan W. 2013. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv:1308.2012. doi: 10.48550/arXiv.1308.2012.
  • Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27(6):764–770. doi: 10.1093/bioinformatics/btr011.
  • Mishima K, Hirakawa H, Iki T, Fukuda Y, Hirao T, Tamura A, Takahashi M. 2022. Comprehensive collection of genes and comparative analysis of full-length transcriptome sequences from Japanese larch (Larix kaempferi) and kuril larch (Larix gmelinii var. japonica). BMC Plant Biol. 22(1):470. doi: 10.1186/s12870-022-03862-9.
  • Mishima K, Hirao T, Tsubomura M, Tamura M, Kurita M, Nose M, Hanaoka S, Takahashi M, Watanabe A. 2018. Identification of novel putative causative genes and genetic marker for male sterility in Japanese cedar (Cryptomeria japonica D.Don). Bmc Genom. 19(1):277. doi: 10.1186/s12864-018-4581-5.
  • Muranty H, Jorge V, Bastien C, Lepoittevin C, Bouffier L, Sanchez L. 2014. Potential for marker-assisted selection for forest tree breeding: lessons from 20 years of MAS in crops. Tree Genet Genomes. 10(6):1491–1510. doi: 10.1007/s11295-014-0790-5.
  • Neale DB, Kremer A. 2011. Forest tree genomics: growing resources and applications. Nat Rev Genet. 12(2):111–122. doi: 10.1038/nrg2931.
  • Neale DB, Wheeler NC. 2019. Gene and genome sequencing in conifers: modern Era. In: Neale D Wheeler N, editors. The conifers: genomes, variation and evolution. Cham: Springer International Publishing; pp. 43–60.
  • Ohri D. 2021. Variation and evolution of genome size in gymnosperms. Silvae Genet. 70(1):156–169. doi: 10.2478/sg-2021-0013.
  • Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19):3210–3212. doi: 10.1093/bioinformatics/btv351.
  • Takahashi M, Miura M, Fukatsu E, Hiraoka Y, Kurita M. 2023. Research and project activities for breeding of Cryptomeria japonica D. Don in Japan. J Forest Res. 28(2):83–97. doi: 10.1080/13416979.2023.2172794.
  • Tsumura Y, Matsumoto A, Tani N, Ujino-Ihara T, Kado T, Iwata H, Uchida K. 2007. Genetic diversity and the genetic structure of natural populations of Chamaecyparis obtusa: implications for management and conservation. Heredity. 99(2):161–172. doi: 10.1038/sj.hdy.6800978.
  • Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, Mukai Y, Yoshimura K, Tsumura Y. 2005. Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers. Plant Mol Biol. 59(6):895–907. doi: 10.1007/s11103-005-2080-y.
  • Wan T, Gong Y, Liu Z, Zhou Y, Dai C, Wang Q. 2022. Evolution of complex genome architecture in gymnosperms. Gigascience. 11:giac078. doi: 10.1093/gigascience/giac078.