9,307
Views
13
CrossRef citations to date
0
Altmetric
Articles

Genomics and Marker-Assisted Improvement of Vegetable Crops

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all

Abstract

Vegetables are an integral part of the human diet worldwide. Traditional breeding approaches have been used extensively to develop new cultivars of vegetables with desirable characteristics, including resistance/tolerance to biotic and abiotic stresses, high yield, and an elevated content of compounds beneficial to human health. The technological progress since the early 1980s has revolutionized our ability to study and manipulate genetic variation in crop plants. The development of high-throughput sequencing platforms and accompanying analytical methods have led to sequencing and assembly of a large number of plant genomes, construction of dense and ultra-dense molecular linkage maps, identification of structural variants, and application of molecular markers in breeding programs. Linkage mapping and genome-wide association mapping studies have been used to identify chromosomal locations of genes and QTLs associated with plant phenotypic variations important for crop improvement. This review provides up-to-date information on the status of genomics and marker-assisted improvement of vegetable crops with the focus on tomato, pepper, eggplant, lettuce, spinach, cucumber, and chicory. For each vegetable crop, we present the most recent information on genetic resources, mapping populations, genetic maps, genome sequences, mapped genes and QTLs, the status of marker-assisted selection and genomic selection, and discuss future research prospects and application of novel techniques and approaches.

I. Introduction

Vegetables are an assorted group of crop species whose stems, leaves, fruit, flowers, roots, or seed are important components of the human diet worldwide. They can be consumed raw or cooked, and are usually low in carbohydrates and fats while being a good source of vitamins, minerals, and dietary fiber (Singh and Lebeda, Citation2007). The exact definition of what is considered to be a “vegetable” depends on the actual terminology, as differences exist among various classifications (e.g., botanical and culinary). The continent of Asia is the largest producer of many vegetables, including spinach, eggplants, cucumbers, lettuce, peppers, and tomatoes (), while Europe leads in the production of chicory (FAOSTAT, Citation2020). When per capita production is considered, Asia is the main producer of chili peppers, spinach, eggplants, and cucumbers, North America leads in the production of tomatoes, lettuce, and sweet peppers, and Europe is the largest producer of chicory (). Besides these crops, there are many more vegetables highly popular in different regions of the world, however, in this review, we focus on tomato, pepper, eggplant, lettuce, spinach, cucumber, and chicory.

Figure 1. Percentual production of nine vegetable crops at each of six continents in 2018. Percentages were calculated from the world production data (in metric tons of fresh weight) for each crop obtained from FAO (FAOSTAT, Citation2020). Carrots category includes the combined production of carrots and turnips, chicory category shows production of chicory roots, peppers category includes the combined production of chili peppers and peppers, cucumber category includes the combined production of cucumbers and gherkins, and lettuce category includes the combined production of lettuce and leaf chicory. Notice that scales for continents differ.

Figure 1. Percentual production of nine vegetable crops at each of six continents in 2018. Percentages were calculated from the world production data (in metric tons of fresh weight) for each crop obtained from FAO (FAOSTAT, Citation2020). Carrots category includes the combined production of carrots and turnips, chicory category shows production of chicory roots, peppers category includes the combined production of chili peppers and peppers, cucumber category includes the combined production of cucumbers and gherkins, and lettuce category includes the combined production of lettuce and leaf chicory. Notice that scales for continents differ.

Figure 2. Relative production of nine vegetable crops after adjusting for the population size at each continent. Production in metric tons of fresh weight per million population were calculated from the 2018 world production data obtained from FAO (FAOSTAT, Citation2020). Carrots category includes the combined production of carrots and turnips, chicory category shows production of chicory roots, peppers category includes the combined production of chilies and peppers, cucumber category includes the combined production of cucumbers and gherkins, and lettuce category includes the combined production of lettuce and leaf chicory. Notice that scales for vegetable crops differ.

Figure 2. Relative production of nine vegetable crops after adjusting for the population size at each continent. Production in metric tons of fresh weight per million population were calculated from the 2018 world production data obtained from FAO (FAOSTAT, Citation2020). Carrots category includes the combined production of carrots and turnips, chicory category shows production of chicory roots, peppers category includes the combined production of chilies and peppers, cucumber category includes the combined production of cucumbers and gherkins, and lettuce category includes the combined production of lettuce and leaf chicory. Notice that scales for vegetable crops differ.

Traditional breeding approaches are generally slow, labor-intensive, and costly processes. Recent progress in genetics and genomics, however, has been accompanied by the development and deployment of novel tools, techniques, and approaches that could be used to enhance plant breeding programs. Molecular markers, genetic linkage maps, marker assays, and whole-genome sequence have been developed and published for many crop species, including several vegetables (Singh, Citation2007). Marker-assisted selection (MAS) (Collard and Mackill, Citation2008), marker-assisted backcrossing (MABC) (Collard and Mackill, Citation2008), marker-assisted recurrent selection (MARS) (Charmet et al., Citation1999), and genomic selection (GS) (Heffner et al., Citation2009) that can be used for precision breeding are in various stages of development for each vegetable crop, depending on available resources and the complexity of the species genetics and breeding. Various approaches, such as linkage mapping (Tanksley, Citation1993), genome-wide association mapping (GWAS) (Thornsberry et al., Citation2001), nested association mapping (NAM) (Tian et al., Citation2011), and multi-parent advanced generation inter-cross (MAGIC) populations (Cavanagh et al., Citation2008) have been developed for the detection and mapping of genes and QTLs. Though the novel genetic and genomic tools and techniques were applied primarily and frequently in major cereal crops, such as maize, rice, and wheat (Simko, Citation2015), they are gradually finding their way into vegetable genetics and breeding. The current review focuses on seven vegetable crops to provide up-to-date information on available genomic resources, the use of genetic and genomic tools and techniques in breeding programs, and the major anticipated areas of future research for each crop.

II. Tomato

The cultivated tomato (Solanum lycopersicum L.), a diploid species (2n = 2x = 24 chromosomes), is one of the world's most important vegetable crops by economic standards and consumption values. In 2018, tomato production worldwide reached nearly 182 million metric tons and US$ 47.7 billion gross production value, only second to potato (S. tuberosum L.) among all vegetable crops (FAOSTAT, Citation2020). Worldwide, there are more varieties of tomato sold than any other vegetable crop (Foolad and Panthee, Citation2012). Although a tropical species, tomato is grown in almost every corner of the world. The top tomato-producing countries include China (33.8%), India (10.6%), United States (6.9%), Turkey (3.6%), and Egypt (3.6%) (FAOSTAT, Citation2020). Tomato is also an essential dietary component in many countries, including the United States (Valpuesta, Citation2002). Although tomato fruit is generally not considered high in nutritional value, it ranks first among all fruits and vegetables as a major dietary source of vitamins (A and C), minerals (Rick, Citation1980), and phenolic antioxidants (Vinson et al., Citation1998) in the U.S.; this is due mainly to its large consumption volume (USDA, Citation2012). Lycopene is a key carotenoid predominantly found in tomatoes, which provides the red color in fruit. Both lycopene and β-carotene (also found in tomato fruit) have been shown as important antioxidants, and their consumption has been correlated with lower risks of certain cancers (Johnson, Citation2002).

Breeding history of tomato dates back to the 1930s when improvement of the overall horticultural characteristics of tomato started. Tomato has been bred with substantial diversity in plant type, size and growth habit, and fruit shape, size, color, and taste. The majority of tomato cultivars on the market are currently separated into the fresh market (FM) and processing (PROC) types. Fresh market tomatoes, including large beefsteak/slicer, plum/roma, campari, cherry, and grape types, are mainly sold and consumed fresh. Processing tomatoes are usually peeled, cubed, juiced, or sauced to make canned products. Breeding objectives for FM and PROC tomatoes are vastly different, however with the common goal of breeding higher yield per unit area for all tomato types. Other major breeding priorities generic to both types include resistance/tolerance to various biotic (e.g., diseases and insects) and abiotic stresses (e.g., salt, cold, and drought), adaptability to the changing climate, maturity, and plant type for specific production regions (e.g., warm vs. temperate) and conditions (e.g., greenhouse vs. open field), and harvest need. Major specific traits of interest in FM tomato breeding include fruit size, shape, color, firmness, the internal structure (e.g., locule size and number), uniformity, appearance, shelf-life, taste, and flavor. Major specific traits of interest in PROC tomato breeding include determinate and compact growth habit, concentrated fruit set and ripening for once-over machine harvest, jointless pedicel for easiness of harvest, and fruit characteristics, such as firmness, color, pH, titratable acidity, soluble and insoluble solids, and viscosity (Stevens and Rick, Citation1986; Tigchelaar, Citation1986; Foolad and Panthee, Citation2012).

A. Genetic resources, mapping populations, genetic maps, and genome sequences

Tomato, along with the major vegetable crops potato and eggplant, resides in the diverse Solanum genus in the Solanaceae family. In the tomato Solanum section Lycopersicon clade, there are 13 closely related taxa (species), including the cultivated tomato S. lycopersicum L. and its 12 related wild species: S. arcanum Peralta, S. cheesmaniae (L. Riley) Fosberg, S. chilense (Dunal) Reiche, S. chmielewskii (C.M. Rick, Kesicki, Fobes & M. Holle) D.M. Spooner, G.J. Anderson & R.K. Jansen, S. corneliomulleri J.F. Macbr., S. galapagense S. Darwin & Peralta, S. habrochaites S. Knapp & D.M Spooner, S. huaylasense Peralta, S. neorickii (C.M. Rick, Kesicki, Fobes & M. Holle) D.M. Spooner, G.J. Anderson & R.K. Jansen, S. pennellii Correll, S. peruvianum L., and S. pimpinellifolium L. (Knapp and Peralta, Citation2016). The Lycopersicon clade originated from the Andean regions, including Peru, Bolivia, Ecuador, Colombia, and Chile (Bauchet and Causse, Citation2012), although evidence for the exact location of tomato domestication is inconclusive pointing to both Mexico and the Andean regions (Peralta and Spooner, Citation2005). The cultivated tomato species was estimated to contain only about 5% of the total genetic variation existing in all tomato species; this has alluded to two major genetic bottlenecks and stringent selections that occurred during its domestication and early breeding (Miller and Tanksley, Citation1990). To compensate for the limited genetic diversity within the cultivated species, tomato breeding programs have utilized wild tomato accessions as germplasm resources for crop improvement as well as for genetic mapping and identification and introgression of desirable genes and QTLs. This includes genetic factors for disease and insect resistance, abiotic stress tolerance, and improved fruit quality and nutritional values (Foolad, Citation2007; Bauchet and Causse, Citation2012; Foolad and Panthee, Citation2012). To identify and map new genes and QTLs, mostly interspecific crosses between elite tomato breeding lines and accessions within the related wild species have been used to develop mapping populations, including early filial and backcross populations (e.g., F2 and BC1), backcross inbred lines (BILs), recombinant inbred lines (RILs) and near-isogenic lines (NILs), and construct genetic maps, as reviewed elsewhere (Foolad, Citation2007).

The first genetic linkage map of tomato was constructed with 153 morphological and physiological markers in 1968, which revealed all 12 tomato linkage groups (LGs) (Butler, Citation1968). The first molecular linkage map of tomato was published in 1986, using a combination of 18 isozymes and 94 RFLP markers (Bernatzky and Tanksley, Citation1986). The first “high-density” genetic map of tomato was published in 1992, which comprised 1,030 molecular markers (mostly RFLPs) (Tanksley et al., Citation1992). The development and advancement of other types of molecular markers, including AFLPs (Zabeau and Vos, Citation1993; Vos et al., Citation1995), SSRs (Tautz, Citation1989; He et al., Citation2003), RGAs (Zhang et al., Citation2002; Niño-Liu et al., Citation2003), ESTs (Adams et al., Citation1991), COSs (Fulton et al., Citation2002), RAPDs (Williams et al., Citation1990), SCARs (Paran and Michelmore, Citation1993), CAPSs (Konieczny and Ausubel, Citation1993), SNPs and InDels (Landegren et al., Citation1998), resulted in the construction and publication of additional tomato genetic linkage maps with significantly greater marker density. For example, the availability of a tomato SNP array containing 7,720 SNPs in 2012 (Hamilton et al., Citation2012) resulted in the development of rather high-density genetic maps of tomato (with >3,000 SNP markers) based on three interspecific F2 mapping populations from S. lycopersicum × S. pennellii and S. lycopersicum × S. pimpinellifolium crosses (Hamilton et al., Citation2012). Due to limited marker polymorphisms within the cultivated species of tomato, few molecular maps have been constructed based on intraspecific crosses. Most recently, an “ultra-high density” tomato genetic map was constructed based on a S. lycopersicum × S. pimpinellifolium RIL population, which contained 141,083 SNP markers grouped into 2,869 genomic bins (Gonda et al., Citation2019). This map was also used for fine mapping of genes and QTLs related to fruit weight and lycopene content in tomato (Gonda et al., Citation2019). Another high-density tomato genetic bin map, consisting of 1,195 genetic bins (8,470 SNPs), has recently been constructed using genotyping-by-sequencing (GBS) and a different S. lycopersicum × S. pimpinellifolium RIL population, which has been used for fine mapping of the late blight resistance gene Ph-5 in tomato (Jia, Citation2019).

In 2012, the tomato genome consortium published the first high-quality tomato genomic sequence of an inbred PROC tomato cultivar, Heinz 1706 (The Tomato Genome Consortium, Citation2012). An improved version of the tomato reference genome assembly (SL4.0) was released in 2019, assembled de novo from long reads of PacBio, and scaffolded with Hi-C contact maps (Hosmani et al., Citation2019). The new reference map removed 11 Mb of contig gaps from a previous assembly resulting in a total size of 782.6 Mb with 71.77% repeat content. The updated annotation of tomato genome ITAG4.0 reported a total of 34,075 protein-coding genes using RNA-seq, resistance gene enrichment sequencing (RenSeq), and other forms of expression data (Hosmani et al., Citation2019). The concurrent release of the tomato reference genome (The Tomato Genome Consortium, Citation2012) and the development of GBS method (Elshire et al., Citation2011; Poland et al., Citation2012) further revolutionized genetic mapping in tomato for traits, such as fruit, leaf, and root characteristics as well as disease resistance (Fulop et al., Citation2016; Celik et al., Citation2017; Ohlson et al., Citation2018; Xie et al., Citation2019).

B. Mapped genes and QTLs

Molecular markers and genetic maps of tomato have been used extensively for identification, mapping, and characterization of genes and QTLs for many agriculturally important traits, including resistance/tolerance to biotic and abiotic stresses, flower- and fruit-related characteristics, plant type, maturity and yield. In a previous review paper, most of the genes and QTLs which were identified and genetically mapped on tomato chromosomes for the various traits until 2012 were tabulated (Foolad and Panthee, Citation2012). Since then, molecular markers associated with additional genes or QTLs for many traits in tomato have been reported. In below, we present some of the most important genes and QTLs used for breeding purposes in tomato ().

Table 1. Major tomato (Solanum lycopersicum L.) genes and QTLs used in marker-assisted breeding for resistance against fungal, bacterial, viral and nematode diseases.

The cultivated tomato is impaired by more than 200 fungal, bacterial, viral, and nematode diseases (Lukyanenko, Citation1991). Host plant resistance has been the main focus of many tomato breeding programs around the world and has resulted in the identification, genetic mapping, and utilization of resistance genes or QTLs for many diseases, including Fusarium wilt (caused by Fusarium oxysporum), Verticillium wilt (Verticillium albo-atrum), tomato leaf mold (Cladosporium fulvum), late blight (Phytophthora infestans), bacterial speck (Pseudomonas syringae), bacterial spot (Xanthomonas race T1-T4), tomato mosaic virus (ToMV), tomato yellow leaf curling virus (TYLCV), tomato spotted wilt virus (TSWV), and root-knot nematode (RKN; Meloidogyne spp.). To date, more than 20 resistance genes for fungal, bacterial, viral, and nematode diseases have been mapped and/ or cloned in tomato () (Foolad and Panthee, Citation2012; Causse and Grandillo, Citation2016).

As the most popular vegetable crop in the world by production and value (https://www.nass.usda.gov), tomatoes are grown in very diverse climatic conditions and thus they require adaptation to various environmental stresses (Chaudhary et al., Citation2019). During the past few decades, many studies have been conducted to identify genes or QTLs conferring tolerance to abiotic stresses, including salt (Foolad and Jones, Citation1993; Foolad et al., Citation1997; Citation1998a; Citation2001; Foolad and Chen, Citation1999), cold (Foolad et al., Citation1998b; Truco et al., Citation2000; Liu et al., Citation2010; Citation2016), heat (Lin et al., Citation2010; Xu et al., Citation2017b; Wen et al., Citation2019b), and drought (Foolad et al., Citation2003; Albert et al., Citation2016; Diouf et al., Citation2020). Most of the reported QTLs cover large genomic regions often contributed from the wild species of tomato, and it has been very challenging to utilizing them in breeding programs. Therefore, little progress has been made in developing tomatoes with tolerance to abiotic stresses using the identified QTLs. An alternative and potentially promising approach to breeding for abiotic stress tolerance in tomato is the use of genetic engineering and production of transgenic tomatoes, as reviewed elsewhere (Gerszberg and Hnatuszko-Konka, Citation2017; Krishna et al., Citation2019). However, no transgenic tomato cultivar is currently available on the market, mainly due to poor consumer acceptance of genetically modified organisms (GMOs).

Other important genes and QTLs identified in tomato include those associated with plant growth habits, maturity, and fruit quality. For example, once-over machine harvest in PROC tomato production requires cultivars with determinate and compact plant type, concentrated fruit setting and ripening, very firm fruit, and easy-to-detach pedicel (jointless). The discovery, genetic mapping, and incorporation of SELF-PRUNING (sp) gene, compound inflorescence (s) gene, and jointless (j-2) gene have greatly contributed to the success of PROC tomato industry (Pnueli et al., Citation1998; Budiman et al., Citation2004; Lippman et al., Citation2008). The fresh market tomato industry has also benefited significantly from genetic mapping research, especially as relates to fruit shape and quality (Capel et al., Citation2017; Celik et al., Citation2017; Gao et al., Citation2019; Gonda et al., Citation2019; Safaei et al., Citation2020), fruit colors of red, pink, yellow, green and purple (Fray and Grierson, Citation1993; Ronen et al., Citation2000; Barry and Giovannoni, Citation2006; Mes et al., Citation2008; Ballester et al., Citation2010), increased fruit lycopene content (Chen et al., Citation1999; Zhang and Stommel, Citation2000; Ashrafi et al., Citation2012; Kinkade and Foolad, Citation2013; Gonda et al., Citation2019), and extended fruit storage life (Kinzer et al., Citation1990; Moore et al., Citation2002; Vrebalov et al., Citation2002).

To conquer difficulties of breeding for more complex traits in tomato, researchers have employed other emerging technologies to identify, map and characterize all relevant genes and QTLs. For example, genome-wide association studies (GWAS) have been carried out to characterize complex traits, such as fruit flavor (Zhang et al., Citation2015; Zhao et al., Citation2019) and quality (Zhang et al., Citation2016; Phan et al., Citation2019), metabolic attributes (Sauvage et al., Citation2014), and agronomic characteristics (Shirasawa et al., Citation2013; Bauchet et al., Citation2017). Combining RNA-seq technique with QTL mapping is another approach that has been used to fine map QTLs or identify candidate genes in QTL regions in many agriculturally important crops, including tomato (Muktar et al., Citation2015; Cui et al., Citation2017; Yang et al., Citation2018a). The RenSeq technique (Jupe et al., Citation2013) has also been a useful approach to discovering disease-resistant genes and their associated markers, which has been applied in tomato to sequence NBS-LRR (nucleotide-binding site, leucine-rich repeat) gene enriched libraries (Jupe et al., Citation2013; Andolfo et al., Citation2014). In general, recent genomic technologies have greatly enhanced our abilities to map and discover new genes and QTLs in tomato.

C. Marker-assisted selection and genomic selection

Although tomato was among the first crop plants for which genetic markers and maps were utilized for breeding purposes (Tanksley, Citation1983) until the early 1980s almost all tomato breeding programs relied mainly on phenotypic selection (PS). With the discovery of high throughput and more breeder-friendly genetic markers, including PCR-based markers and SNPs, there has been an increased interest in the use of markers to facilitate tomato crop improvement. A review of the literature indicates that although markers have been identified for the most important disease resistance traits in tomato, not all reported markers have been verified or are readily applicable in tomato breeding. Yet, MAS is employed frequently in most tomato breeding programs for gene incorporation and stacking, especially when breeding cultivars for multiple disease resistance traits. For example, SCAR, CAPS, and other PCR-based markers are frequently used in most private and public tomato breeding programs when selecting for many of the major-gene disease resistance traits (specific marker information summarized elsewhere (Foolad and Panthee, Citation2012; Lee et al., Citation2015). Genetic markers are also used routinely for various other purposes, including testing hybrid purity and screening breeding populations for plant types and fruit quality characteristics. However, markers are not typically employed when breeding for complex traits, including polygenic disease-resistant traits (e.g., bacterial canker and early blight), abiotic stress tolerance, yield, and many fruit quality characteristics.

As ultra-high density genetic linkage maps have become available in tomato (Sim et al., Citation2012; Gonda et al., Citation2019), instead of relying on individual markers associated with traits of interest for MAS, breeders may use all the available marker data in breeding material in a process known as genomic selection (GS) to predict the breeding value of a line or a population more accurately using a pre-trained model (Meuwissen et al., Citation2001; Goddard and Hayes, Citation2007). For example, in a study to compare GS model-based selection with PS when breeding for multiple bacterial spot resistance genes in tomato, a training population was developed from intercrossing among six diverse parents with different bacterial spot resistance and the progeny underwent inbreeding to model inbred line development (Liabeuf et al., Citation2018). The population was genotyped with the SolCAP chip array and different GS models were performed to estimate genomic estimated breeding values (GEBV). After cross-validation, the authors concluded that the GS models provided more accuracy in predicting breeding values of both inbred progeny and hybrids when compared with PS (Liabeuf et al., Citation2018). In another study, the efficiency of using GS for tomato fruit quality prediction was estimated, and it was determined that marker density, as well as population size and structure did affect the accuracy of GEBV (Duangjit et al., Citation2016). Although GS has been successfully implemented in animal breeding and several other crop species, it has yet to be extensively examined and utilized in tomato breeding.

D. Future outlook

Marker-assisted selection has transformed tomato breeding during the past few decades by providing breeders with the foreknowledge of traits during the seedling stage, allowing them to make selections early on and with greater precision and accuracy. However, it would be presumptuous to state that MAS or other genomic approaches will completely replace PS shortly in tomato breeding. Nonetheless, agricultural sciences are moving ahead into the era of omics discoveries, including genomics, transcriptomics and proteomics, and tomato breeders need to take advantage of the abundantly available omics data and use them for more targeted and accelerated breeding. There are indications that this is happening in tomato genetics and breeding. For example, most recently, a pan-genome study in tomato, using genome sequences of 725 diverse accessions, discovered 4,873 novel genes, which were missing from the tomato reference genome; this study also identified a novel rare allele regulating fruit flavor (Gao et al., 2019). Another pan-genome study on 100 diverse tomato lines captured 238,490 structural variants (SV), many of which have major impacts on gene expression and epistasis involved in fruit flavor, size, production, and harvest traits (Alonge et al., Citation2020). Both of the pan-genome studies unveiled opportunities for further advancement in genetic mapping and innovative breeding in tomato. One of the major challenges in PS, and to some extent in MAS, is linkage drag, which is the unwanted transfer of undesirable linked genes from wild species into the cultivated tomato genetic background. To alleviate this issue, the Nobel Prize-winning technique of gene editing via CRISPR/Cas has provided a solution in some cases by targeted modification of desired traits using gene disruption or replacement; the use of this technique in tomato has been reviewed elsewhere (Rothan et al., Citation2019). It’s expected that this technology will be highly useful to tomato breeders for targeted and accelerated crop improvement. There are also recent efforts of using gene editing to target promoter regions of several tomato genes associated with important complex agronomic characteristics in order to induce beneficial quantitative variation, which could be utilized in breeding (Rodriguez-Leal et al., Citation2017). Furthermore, Zsögön et al. (Citation2018) reported de novo domestication of a wild tomato species by editing only six critical genes involved in tomato domestication.

Some of the most important traits that tomato breeders currently focus on are disease resistance, environmental stress tolerance (in particular tolerance to heat and drought), fruit quality and shelf-life characteristics, as well as traits allowing for mechanical harvest of FM tomatoes. While some good progress has already been made, it is conceivable that tomato breeding in the next decade will conquer most of these complex traits through a combination of PS, MAS, multi-omics-based approaches, genome selection, gene editing, and genetic transformation.

III. Pepper

Pepper, belonging to the genus Capsicum of the family Solanaceae, is an important vegetable and spice crop worldwide. Believed to have been originated in Bolivia (Perry et al., Citation2007), the genus Capsicum comprises ∼35 species, including the five economically important cultivated species Capsicum annuum L., C. frutescens L., C. baccatum L., C. chinense Jacq., and C. pubescens Ruiz & Pav. Capsicum species are all diploids, generally having 24 chromosomes (2n = 2x = 24), whereas many wild species carry 26 chromosomes. Pepper exhibits diverse variation in morphological and yield-related characteristics, including plant architecture, flowering time, fruit size, shape, color, and phytochemical contents, and resistance/tolerance to biotic and abiotic stresses. Pepper can grow in almost all soil types, but a well‐drained, moisture‐retaining loamy soil is most desirable. The optimum temperature for pepper seed germination is 25—30 °C, whereas that for plant growth and fruit development ranges from 18 to 30 °C. In 2018, worldwide pepper production was ∼59.5 million metric tons on a total area of ∼4.6 million hectares (FAOSTAT, Citation2020). Apart from being used as a vegetable, pepper has a wide range of uses in the food, pharmaceutical, and cosmetics industries.

A. Genetic and genomic resources

Capsicum possesses abundant genetic resources and a rich gene pool (Barchenger et al., Citation2019), with several mutant populations and germplasm collections available for genetic and breeding studies (Paran et al., Citation2007; Jeong et al., Citation2012; Arisha et al., Citation2015; Gu et al., Citation2019; Pereira-Dias et al., Citation2019; Solomon et al., Citation2019; Siddique et al., Citation2020). Capsicum annuum is the most important species economically, and breeding programs have focused mainly on improving its resistance to pests and diseases. Landraces and wild relatives of the cultivated species are the major sources of genetic resistance to numerous pepper diseases; however, successful introgression of desirable traits from wild relatives into C. annuum has been constrained by considerable cross incompatibilities (Onus and Pickersgill, Citation2004). Selective and conscious use of wild resources is essential for the continuous improvement of the cultivated pepper.

The extent of genetic diversity within Capsicum genus has been analyzed using molecular markers (Zewdie et al., Citation2004; Nicolaï et al., Citation2013; Lee et al., Citation2016a); however, many of these studies included relatively small numbers of accessions or populations that existed in gene banks. There is a greater need for a vast and in-depth evaluation and characterization of Capsicum genetic resources, including the determination of genetic diversity within and between species.

The first genetic linkage map of Capsicum was reported in 1984, constructed based on an interspecific population derived from a cross between C. annuum and C. chinense (Tanksley, Citation1984). Since then, additional linkage maps with higher marker density and better genome coverage have been developed using both intra- and interspecific mapping populations, such as F2, BC1, RILs, and DHs (Lefebvre et al., Citation1995; Kang et al., Citation2001; Lefebvre et al., Citation2002; Paran et al., Citation2004; Han et al., Citation2016a; Lee et al., Citation2016c). Several intraspecific C. annuum mapping populations have been utilized for mapping of disease-resistance genes (Lefebvre et al., Citation1995; Lefebvre et al., Citation2002). Further, a few integrated and comparative genetic linkage maps have been developed and published (Prince et al., Citation1993; Livingstone et al., Citation1999; Jahn et al., Citation2000; Rehrig et al., Citation2014; Han et al., Citation2016a).

A high‐quality pepper reference genome is essential for further advancing molecular genetics research and promoting genomics-assisted breeding activities. Draft genome assemblies of C. annuum “Criollo de Morelos 334” (CM334), C. annuum Zunla, and C. annuum var. glabriusculum were reported in 2014 with comparable genome coverage (Kim et al., Citation2014; Qin et al., Citation2014). Resequencing of two other C. annuum lines, “Dempsey” (a large bell-type genotype) and “Perennial” (a genotype with small, elongated fruit), resulted in the development of an ultra-high density linkage map of pepper (Han et al., Citation2016a). Subsequently, efforts were made to obtain improved genome assemblies for two other domesticated species, C. chinense accession “PI159236” and C. baccatum accession “PBC81” (Kim et al., Citation2017d). However, these pepper genome assemblies have been based on short-read sequencing approaches, which have inherent limitations, such as low continuity and low coverage of transposable elements, which hamper genome-based gene identification. More recently, genomes of four Italian sweet pepper landraces, which are important pre-breeding resources, were re-sequenced using Illumina short-reads-based technology (Acquadro et al., Citation2020), and genome assembly of an intraspecific C. annuum F1 hybrid was procured (Hulse-Kemp et al., Citation2018). Further, to explore the genetic variability and diversity of Capsicum species, a pepper pan-genome was constructed based on resequencing 383 accessions, representing 355 C. annuum, 4 C. baccatum, 11 C. chinense, and 13 C. frutescens species (Ou et al., Citation2018). Further improvements in genome assemblies and genetic and genomic resources are expected to facilitate a better understanding of Capsicum genome architecture and accelerate pepper crop improvement.

B. Mapped genes and QTLs

Pepper genetic resources with different origins have been used extensively for mapping many agronomic traits, including resistance to pathogens, male sterility (MS), pungency, and morphological traits. Numerous genetic loci governing important traits have been identified and closely linked markers developed ().

Table 2. Genes and QTLs for which marker-assisted selection is conducted in pepper (Capsicum species).

A wide range of pathogens can affect pepper, causing considerable yield and fruit quality losses. One of the most devastating pathogens of pepper is oomycete Phytophthora capsici, which causes leaf blight and root rot. Capsicum annuum “CM334” is an important source of resistance to P. capsici, and has been utilized by pepper breeders. Several QTLs for resistance to various P. capsici isolates have been detected in different C. annuum genetic backgrounds (Kim et al., Citation2008; Naegele et al., Citation2014; Rehrig et al., Citation2014; Siddique et al., Citation2019). A QTL on chromosome 5 appears to be a major genetic factor involved in resistance to P. capsici (Mallard et al., Citation2013; Liu et al., Citation2014; Rehrig et al., Citation2014; Siddique et al., Citation2019). Recently, PhR10, a single dominant gene for resistance to P. capsici race 3 (Byl4), was mapped to the long arm of pepper chromosome 10 (Xu et al., Citation2016a). Root rot nematode is another important pepper pathogen, causing considerable yield losses. The RKN resistance genes, Mech1, Mech2, Me1, Me3, Me4, Me7, and N, conferring resistance to several Meloidogyne spp., have been mapped to pepper chromosome 9 (Djian-Caporalino et al., Citation2001; Wang and Bosland, Citation2006; Djian-Caporalino et al., Citation2007; Wang et al., Citation2009; Fazari et al., Citation2012; Uncu et al., Citation2015; Bucki et al., Citation2017; Changkwian et al., Citation2019), whereas a QTL conferring resistance to M. javanica co-localizes with the Me gene cluster on chromosome 9 (Barbary et al., Citation2016).

Several QTLs conferring resistance to anthracnose, caused by Colletotrichum spp., have been identified in C. chinense and linked markers have been developed (Voorrips et al., Citation2004; Pakdeevaraporn et al., Citation2005; Kim et al., Citation2010; Mahasuk et al., Citation2016). Resistance resources, including C. baccatum PBC80 and PBC81 and C. chinense PBC932, have been utilized to introgress resistance into susceptible C. annuum genetic backgrounds (Yoon et al., Citation2006; Cremona et al., Citation2018). Accession PBC80 carries a dominant (Co5) and a recessive (co4) gene mapped to chromosomes 12 and 9, respectively. The anthracnose-resistant locus AnRGO5 located on chromosome 5 (Sun et al., Citation2015), has been recently fine mapped (Zhao et al., Citation2020). Genes and QTLs associated with resistance to powdery mildew, caused by Leveillula taurica, have been identified and mapped (Lefebvre et al., Citation2003; Jo et al., Citation2017). Assays based on molecular markers associated with Verticillium resistance have been developed for breeding purposes. The resistance-linked markers were identified through a comparative analysis of Capsicum and the tomato Verticillium wilt resistance genes Ve1 and Ve2 (Barchenger et al., Citation2017). Several dominant loci, including Bs1, Bs2, Bs3, Bs4, and Bs7, conferring resistance to bacterial spots caused by Xanthomonas spp., have been identified and mapped (Wai et al., Citation2015). Further, major QTLs associated with resistance to Ralstonia bacterial wilt have been mapped to chromosomes 1 and 10 (Mimura et al., Citation2009; Du et al., Citation2019).

Pepper is affected by a wide range of viral pathogens, and numerous viral resistance genes have been identified in wild and cultivated pepper genotypes. Molecular marker assays have been developed based on several potyvirus resistance genes, including pvr1 or pvr2, pvr3, Pvr4/Pvr7, pvr5, pvr6, and pvr8 (Kang et al., Citation2005; Ruffel et al., Citation2005; Yeam et al., Citation2005; Venkatesh et al., Citation2018). Several alleles of the pvr1 and pvr6 genes were identified using eco-tilling in cultivated Capsicum accessions (Ibiza et al., Citation2010). The dominant tomato spotted wilt virus (TSWV) resistance gene, Tsw, on the distal portion of chromosome 10 has been cloned (Jahn et al., Citation2000; Kim et al., Citation2017c), and the chili veinal mottle virus (ChiVMV) resistance locus Cvr1 has been mapped to chromosome 6 (Lee et al., Citation2013; Citation2017). A dominant resistance gene, Cmr1, conferring resistance to cucumber mosaic virus (CMV) was identified from the C. annuum cultivar “Bukang” and mapped to chromosome 2 (Kang et al., Citation2010). In addition, several QTLs associated with resistance to CMV have been identified on chromosomes 5, 11, and 12 (cmv 12.1) (Ben Chaim et al., Citation2001a; Caranta et al., Citation2002; Yao et al., Citation2013). The cmr2 gene confers resistance to a broad range of CMV strains, including the Cmr1 resistance-breaking strains CMVKorean and CMVFNY (Choi et al., Citation2018). Using a specific locus amplified fragment sequencing (SLAF-seq) approach, a single gene (CA02g19570) located on chromosome 2 was identified as the candidate for qCmr2.1, which provides resistance to CMVFNY (Guo et al., Citation2017).

Both genic male sterile (GMS) and cytoplasmic male sterile (CMS) systems are utilized for hybrid seed production in pepper (Jo et al., Citation2016; Jeong et al., Citation2018). Several molecular markers linked to genes controlling GMS, such as msk (Lee et al., Citation2010a), ms1 (Lee et al., Citation2010b; Jeong et al., Citation2018), ms3 (Lee et al., Citation2010c; Naresh et al., Citation2018), ms8 (Bartoszewski et al., Citation2012), msw (Naresh et al., Citation2018), and msc1 (Cheng et al., Citation2018), have been developed and used in MAS for hybrid seed production. However, CMS remains the preferred system for hybrid seed production provided that CMS is stable and restorer genes/factors (Rfs) are available (Swamy et al., Citation2017). Cytoplasmic-genic male sterile (CGMS) systems have been used successfully in hot pepper seed production. However, many sweet pepper lines are poor restorers (Lin et al., Citation2007), which limits the successful use of CGMS in sweet pepper. Rf is the most investigated restorer gene (Min et al., Citation2009; Jo et al., Citation2016), and a SCAR marker, CRF-SCAR, linked to the Rf locus has been successfully deployed in MABC to introgress the Rf allele into sweet pepper genotypes (Gulyas et al., Citation2006). Fine mapping of the Rf locus located on chromosome 6 has been performed, and closely linked markers have been developed (Jo et al., Citation2016).

The genetic and molecular aspects of fruit secondary metabolite composition, particularly fruit pungency, have been studied extensively in pepper (Blum et al., Citation2003; Stewart et al., Citation2005; Ben-Chaim et al., Citation2006; Yarnes et al., Citation2013; Eggink et al., Citation2014; Nimmakayala et al., Citation2016; Lee et al., Citation2016b; Park et al., Citation2019). Pungency in peppers is caused by capsaicinoid compounds, the presence of which is primarily regulated by capsaicin synthase, encoded by the Pun1 gene (Stewart et al., Citation2005). Many non-pungent peppers contain non-functional Pun1 alleles with a deletion (pun1), frameshift mutation (pun12), or premature stop codon (pun13) (Stewart et al., Citation2005; Stellari et al., Citation2010). Loss of function of another gene, pAMT, causes an extreme reduction in capsaicinoid content (Lang et al., Citation2009; Tanaka et al., Citation2010). The Pun2 locus regulating pungency levels was identified in C. chacoense (Stellari et al., Citation2010). The pun2 allele is proposed to be the ortholog of gene cap (Blum et al., Citation2003). The Pun3 locus encoding the CaMYB31 transcription factor (Arce-Rodríguez and Ochoa-Alejo, Citation2017; Han et al., Citation2019) and a putative ketoacyl-ACP reductase (CaKR1) gene (Koeda et al., Citation2019) controlling pungency levels have also been identified. Structural genes involved in capsaicinoid biosynthesis, such as 3‐keto‐acyl‐ACP synthase (Kas), phenylalanine ammonia-lyase (Pal), and thioesterase (Fat), have been known for a long time; however, allelic variations affecting capsaicinoid biosynthesis have not been identified. In addition to QTLs for pungency, a QTL for fruit flavor (the strong odor of C. baccatum) was identified on pepper chromosome 3 using an interspecific cross derived from C. annuum and C. baccatum (Eggink et al., Citation2014). Minor QTLs associated with sensory traits have also been detected on different chromosomes (Eggink et al., Citation2014).

The color of ripe pepper fruit is determined mainly by carotenoids in contrast to immature fruit color that is determined by chlorophylls and anthocyanins. Both quantitative and qualitative genetic factors are known to be involved in the variation of pepper fruit pigmentation (Jeong et al., Citation2019; Jang et al., Citation2020). Based on the inheritance of pepper fruit color variation, a three-locus model (C1, C2, and Y) has been proposed (Hurtado-Hernandez and Smith, Citation1985), with the C2 and Y loci encoding phytoene synthase and capsanthin-capsorubin synthase, respectively (Popovsky and Paran, Citation2000; Huh et al., Citation2001). Recently, it was demonstrated that the C1 locus, encoding pseudo-response regulator 2 (PRR2), is responsible for the white color of immature fruit in pepper (Jeong et al., Citation2020). Mutations in carotenoid biosynthesis pathway genes, such as those encoding lycopene cyclase (LCYB), lycopene ɛ-cyclase (LCYE), β-carotene hydroxylase (CrtZ), capsorubin synthase (CCS), and zeaxanthin epoxidase (ZEP), cause color variation from red to yellow or orange (Popovsky and Paran, Citation2000; Thorup et al., Citation2000; Huh et al., Citation2001; Borovsky et al., Citation2013; Tian et al., Citation2014). The major QTLs controlling immature fruit color, pc10.1 and pc8.1 (pc1), correspond to the pepper GOLDEN2-like (GLK2) and LSD ONE LIKE1 (LOL1) transcription factors, respectively (Brand et al., Citation2012; Citation2014; Jeong et al., Citation2020). Several QTLs regulating immature fruit color variation has been detected on chromosomes 10, 11, and 12 (Han et al., Citation2016a).

Yield-related factors, such as the number of fruit per plant and fruit size and weight, have not been well-explored in pepper. However, QTLs for fruit diameter and pericarp thickness, and fruit length and weight (FW) have been identified (Ben Chaim et al., Citation2001b; Citation2003; Barchi et al., Citation2009; Yarnes et al., Citation2013; Dwivedi et al., Citation2015), including FW QTLs fw2.1, fw3.2, and fw4.2 (Ben Chaim et al., Citation2001b, Rao et al., Citation2003; Zygier et al., Citation2005). Minor QTL clusters underlying FW, shape (FS) and diameter, pericarp thickness, and the number of locules have been located on chromosomes 11 and 12 (Barchi et al., Citation2009). Two QTLs, fs3.1 and fs10.1, for fruit elongation have been identified (Ben Chaim et al., Citation2003; Borovsky and Paran, Citation2011). Major QTLs underlying FS within C. annuum have been detected on chromosomes 1, 3, and 4 in multiple populations (Ben Chaim et al., Citation2001b; Citation2003; Barchi et al., Citation2009; Yarnes et al., Citation2013; Dwivedi et al., Citation2015; Han et al., Citation2016a). Fruit shape QTLs have also been detected in several interspecific crosses, including a cross between C. annuum and C. chinense, on chromosomes 1, 3, 4, and 10 (Ben Chaim et al., Citation2003; Zygier et al., Citation2005; Borovsky and Paran, Citation2011; Yarnes et al., Citation2013). The fs2.1, FrSHP2.1, and fs3.1 are major effect QTLs for FS and are located on chromosomes 2 and 3 (Ben Chaim et al., Citation2001b; Citation2003; Rao et al., Citation2003; Zygier et al., Citation2005; Barchi et al., Citation2009; Borovsky and Paran, Citation2011; Mimura et al., Citation2012; Yarnes et al., Citation2013; Dwivedi et al., Citation2015; Hill et al., Citation2017; Chunthawodtiporn et al., Citation2018). Furthermore, genes related to shooting architecture, including CaBLIND (Jeifetz et al., Citation2011), CaJOINTLESS (Cohen et al., Citation2012), CaFASCICULATE (Elitzur et al., Citation2009), and CaS (Cohen et al., Citation2014) have been identified using ethyl methanesulfonate (EMS)-induced mutants. QTLs affecting trichome density have also been identified in CM334 on chromosome 10 (Kim et al., Citation2011; Chunthawodtiporn et al., Citation2018).

C. Marker-assisted selection and genomic selection

Over the past two decades, pepper breeding has mainly focused on the genetic improvement of hot and sweet peppers by incorporation of pest and disease resistance. Recent developments in next-generation sequencing (NGS) and high-throughput genotyping approaches have facilitated the rapid discovery of SNP markers in Capsicum spp. High-density genetic linkage maps for various populations, mostly F2 or DH, are being published. Sequence variations, including SNPs and Indels, can be easily identified using high-throughput sequencing, and genotyping can be readily performed using several platforms (Cheng et al., Citation2016; Hulse-Kemp et al., Citation2016; Nimmakayala et al., Citation2016; Han et al., Citation2016a). Among the many NGS technologies, GBS is a simple, rapid approach that has been used in biparental QTL mapping and GWAS approaches (Han et al., Citation2016a; Han et al., Citation2018; Siddique et al., Citation2019).

Among all the modern breeding tools, molecular marker technology has shown the most significant development and utility over the last two decades. Multiple marker datasets based on various marker types, including RFLPs, RAPDs, AFLPs, SCARs, SSRs, CAPS, and HRM-PCR, are now available for Capsicum researchers, along with high-throughput genotyping platforms. Marker development has become less expensive with the use of publicly available genome sequences (Kim et al., 2014; Qin et al., 2014; Kim et al., 2017d). The low cost of identifying SNPs distributed throughout the genome allows their use for QTL mapping, GWAS, or pinpointing a target region, facilitating the high-resolution mapping of QTLs and conducting MAS. Several trait-linked markers have been developed for MAS and are being utilized in pepper breeding programs, including allele-specific CAPS markers for pvr1, pvr11, pvr12, and pvr2 genes (Kang et al., Citation2005; Yeam et al., Citation2005; Rubio et al., Citation2008; Holdsworth and Mazourek, Citation2015). Several closely linked markers for resistance to important diseases in pepper, including those caused by P. capsici, pepper mottle virus (PePMoV), TSWV, and anthracnose, have also been developed for MAS (Moury et al., Citation2000; Hoang et al., Citation2013; Holdsworth and Mazourek, Citation2015; Kim et al., Citation2017c; Zhao et al., Citation2020). Bs1, Bs2, and Bs3 resistance genes have been introgressed into several commercial pepper cultivars. Marker-assisted gene pyramiding of Bs5 and Bs6 has conferred broad-spectrum resistance against Xanthomonas spp. (Vallejos et al., Citation2010). Major QTLs for resistance to Ralstonia bacterial wilt are mapped to chromosome 1 (from Capsicum accession LS2341), linked to SSR marker CAMS451 (Mimura et al., Citation2009), and chromosome 10 (from C. annuum BVRC1), linked to marker ID10-194305124 (Du et al., Citation2019).

Recently, GS was investigated for fruit-related traits in pepper using 351 accessions from the pepper core collection as a training population (Hong et al., Citation2020). Various conditions were tested for effective GS, including different genomic prediction models and the number of markers. Genomic selection models were tested using a RIL population and produced moderate prediction accuracies of 0.34, 0.48, 0.32, and 0.50 for fruit shape, weight, length, and width, respectively. This study demonstrated the potential use of GS as a tool for improving fruit-related characteristics. Although only moderate prediction accuracies have been obtained in the initial study, further improvements in the accuracy of genomic prediction are expected by integrating larger-scale genomics, GWAS, and phenomics platforms (Hong et al., Citation2020).

D. Future outlook

Capsicum genetic and breeding research has seen considerable progress during the last decade. Breeding programs are taking advantage of rapid progress in the precision and speed of NGS technologies. Reduced representation sequencing approaches, including GBS, DArTseq, and SLAF-seq, have allowed identification and analysis of a large number of genetic loci through high-throughput genome screening at a relatively low cost (Li et al., Citation2018; Naresh et al., Citation2018; Du et al., Citation2019; Siddique et al., Citation2019; Tamisier et al., Citation2020). Using C. annuum reference genomes (Qin et al., 2014; Kim et al., Citation2017b), genes governing economically important traits, such as disease resistance, pungency, male sterility, and morphological characteristics have been positioned (Jo et al., Citation2016; Nimmakayala et al., Citation2016; Han et al., Citation2016a; Cheng et al., Citation2018; Kim et al., Citation2017c). The availability of Capsicum reference genomes and their annotation data enables comparative analyses of results from multiple studies on the same traits, increasing the power of candidate gene identification.

Genomic tools, resources, and approaches are at various stages of development and application in pepper breeding programs. While MAS is being routinely employed, GS remains at an early stage of development. With ideal resource development and allocation, GS could be applied to pepper breeding for accurate estimation of hybrid performance. Since the release of the first draft genomes of pepper (Qin et al., 2014; Kim et al., 2014), GWAS has been employed for genetic analysis of traits, such as capsaicinoid content (Nimmakayala et al., Citation2016; Han et al., Citation2018), bacterial spot resistance (Potnis et al., Citation2019), P. capsici root rot resistance (Siddique et al., Citation2019), potato virus Y (PVY) resistance (Tamisier et al., Citation2020), peduncle length (Nimmakayala et al., Citation2016), and several other fruit-related characteristics (Nimmakayala et al., Citation2016; Colonna et al., Citation2019; Lee et al., Citation2020). Significant haplotypes detected in GWAS-QTL studies will serve as a unique molecular tool for developing robust markers for crop improvement. In view of the rapid progress in genomics and sequencing technologies, we anticipate that studies deploying whole‐genome-sequencing approaches, including QTL‐seq and MutMap, will facilitate pepper crop improvement and allow a comprehensive understanding of structural and functional assays of genes involved in various physiological processes. Further functional assays of candidate genes identified in these studies, will provide additional targets for genetic improvement of important traits in pepper through crop breeding. Although a number of candidate genes conferring pest and disease resistance traits have been identified in pepper, many of them have not been cloned or functionally characterized. One obstacle hampering functional genomic studies in pepper is the paucity of efficient genetic transformation protocols.

The use of reference genomes in plant breeding and other related research is highly dependent on accessibility and quality. Recent sequencing efforts (Kim et al., 2017d; Hulse-Kemp et al., Citation2018) underscore the need for further improvement of the currently available Capsicum genomic resources. Improved versions and pan-genomes of Capsicum are now becoming a reality due to rapid technical advances in DNA sequencing technologies and the reduced cost of long-read sequencing. Shortly, high-quality reference genomes and genetic tools with greater accessibility will enable the investigation of complex biological questions and expedite trait discovery in pepper.

IV. Eggplant

Eggplant (Solanum melongena L., 2n = 2x = 24), aka. brinjal or aubergine is a member of the Solanaceae family and the third most widely grown Solanaceous vegetable after potato and tomato. China, India, and Iran are the leading producing countries, and Egypt, Turkey, and Italy are the main producers in the Mediterranean region. The global production of eggplant is around 54 million metric tons annually, valued at over US$10 billion (FAOSTAT, Citation2020). Eggplant fruit (berry) has low in calorie and is considered a healthy vegetable due to its content of vitamins, minerals, and bioactive compounds, such as anthocyanins in the skin and chlorogenic acid (CGA) in the flesh (Gürbüz et al., Citation2018). The CGA content varies among cultivars and it is influenced by fruit developmental stage, storage conditions, and environmental factors (Mennella et al., Citation2012; Plazas et al., Citation2013). Oxidation of CGA by polyphenol oxidases is responsible for the browning of the fruit flesh after cutting. Eggplant also contains some anti-nutritional compounds, including saponins and steroidal glycoalkaloids (α-solamargine and α-solasonine). There are no guidelines on maximum healthy levels of glycoalkaloids in eggplant, however, it has been reported that they may also play a health-promoting function, such as inhibiting the growth of cancer cells in vitro and in vivo (Friedman, Citation2015).

Several non-exclusive theories have been proposed regarding the origin of eggplant species. Unlike its congeners tomato and potato, which are native to Central and South America, eggplant is native to the Old World. The general consensus is that eggplant was domesticated from S. insanum independently in the Indian subcontinent and China (Ali et al., Citation2011; Cericola et al., Citation2013; Page et al., Citation2019), with a possible further center of domestication in the Philippines (Meyer et al., Citation2012). Around the eighth century, eggplant spread eastward to Japan, then westward into South-East Asia and Africa, and then introduced to Mediterranean Basin and subsequently to America (Prohens et al., Citation2005).

Since eggplant is a self-pollinating plant, a large part of its current cultivation relies on the use of inbred lines and more recently progressively F1 hybrids (Kumar et al., Citation2020). Eggplant cultivars are generally classified into three major groups, elongated, semi-elongated, and round berries (Hurtado et al., Citation2013). However, the cultivated germplasm displays an extensive variation in fruit shape and size, and in Asia, some popular varieties are small-fruited and often classified as S. ovigerum (Meyer et al., Citation2019). The fruit peel color ranges from white to various shades of purple (due to a variable concentration of anthocyanin), to green (due to the presence of chlorophyll), to dark purple (due to both anthocyanin and chlorophyll). Varieties characterized by white fruit color with violet stripes are also present in the market.

Two other Solanum species are also known as eggplant and commonly grown in sub-Saharan Africa: the scarlet eggplant (S. aethiopicum L.) and the gboma eggplant (S. macrocarpon L.), to which S. melongena is fully cross-compatible. Scarlet eggplant is an important vegetable in Central and West Africa, but it is also cultivated in the Caribbean and Brazil as well as in some areas of South Italy. It includes four main inter-fertile cultivar groups: “Aculeatum,” which is mainly used as ornamental, “Gilo” grown for its fruit, “Kumba” produced for both its fruit and leaves, and “Shum” for its leaves. Gboma eggplant is also a morphologically variable species exploited for both its fruit and leaves, but it is less widespread and mainly cultivated in the forest regions of Coastal Africa and the Congo River (Plazas et al., Citation2014; Acquadro et al., Citation2017).

A. Genetic resources

The taxonomy and identification of wild eggplant relatives are challenging due to a large number of related species. Based on cross-hybridization and molecular data, the S. melongena primary gene pool (GP1) comprises cultivated eggplant and its wild progenitor S. insanum. The GP2 includes scarlet and gboma eggplants and their wild relatives S. anguivi and S. dasyphyllum, respectively, as well as >40 other wild species to which eggplant can be inter-crossed (Plazas et al., Citation2014). The GP3 includes more distantly related species, which can be hybridized with eggplant only by applying specific breeding techniques, such as embryo rescue or hybrid polyploidization (Rotino et al., Citation2014). Among the eggplant wild relatives, S. aethiopicum, S. linnaeanum, S. sisymbriifolium, S. aculeatissimum, and S. torvum represent major sources of disease resistance, including resistance to Verticillium wilt, one of the most devastating fungal diseases of eggplant (Plazas et al., Citation2016). Resistance to other diseases and pests of eggplant, including bacterial wilt (Xi’ou et al., Citation2015), Ralstonia (Lebeau et al., Citation2011), Fusarium wilt (Boyaci et al., Citation2012), leafhoppers, aphids, and eggplant root and shoot borer (Rotino et al., Citation1997), has also been identified in other wild relatives. S. insanum and S. incanum exhibit drought tolerance (Ranil et al., Citation2017) and S. incanum also possess certain phenolics in the fruit which are absent in the cultivated eggplant (Ma et al., Citation2011).

The World Vegetable Center in Taiwan holds the world's largest public collection of the cultivated eggplant and its wild relatives, maintaining more than 3,000 accessions from 90 countries (Taher et al., Citation2017). Wide collections are also maintained at the Plant Genetic Resources Conservation Unit, USDA-ARS, Griffin, GA, USA, the Center for Genetic Resources at the Wageningen University & Research, The Netherlands, the Vavilov Research Institute of Plant Genetic Resource in Russia, the National Bureau of Plant Genetic Resources in India, the Institute of Vegetables and Flowers in China (GENESYS, Citation2020) and the French National Institute for Agricultural Research (INRA) in Avignon, France (Daunay et al., Citation2000).

B. Mapped genes and QTLs

The first RFLP-based genetic map of eggplant was developed based on an F2 population (n = 58 individuals) of a cross between S. melongena and S. linneanum (Doganlar et al., Citation2002). The map was subsequently improved by including 110 COSII markers, which were previously mapped in the tomato (Wu et al., Citation2009), and used for locating QTLs controlling morphological traits, including leaf lobing, leaf prickles, and prickle anthocyanin (Frary et al., Citation2014). A more complete genetic map was then developed by increasing the number of individuals (n = 108) and markers (Doğanlar et al., Citation2014).

An interspecific F2 population of 48 individuals from a S. melongena × S. linneanum (=S. sodomeum) cross was also used to develop a RAPD/AFLP-based genetic map, in which two QTLs for Verticillium wilt were located (Sunseri et al., Citation2003). Another interspecific map based on 91 BC1 individuals of a S. melongena × S. incanum cross and 242 markers (COSII, SSRs, AFLPs, CAPS, and SNPs) was later developed (Gramazio et al., Citation2014), which encompassed 1,085 cM. Based on synteny of this map with the tomato genetic map, six candidate genes involved in the biosynthesis of chlorogenic acid, five polyphenol oxidase genes, and genes affecting fruit shape (OVATE, SISUN1) and prickliness were located on the twelve identified LGs.

The first intraspecific genetic linkage map of eggplant was published in 2001, which was based on 168 F2 individuals and 181 RAPD and AFLP markers (Nunome et al., Citation2001). This map was used to identify QTLs for fruit shape as well as fruit stem and calyx pigmentation. Another intraspecific eggplant genetic map, published in 2010, was based on 238 molecular markers and 141 F2 individuals derived from a cross between the breeding lines “305E40” (resistant to F. oxysporum due to introgressed Rfo-sa1 locus from S. aethiopicum) and “67/3” (Barchi et al., Citation2010). An intraspecific map of eggplant based on an F6 RIL population of a cross between a Ralstonia solanacearum (RS) resistant line (“AG91-25”) and a susceptible line (“MM738”) was used to locate a major dominant resistance gene, ERs1 (Lebeau et al., Citation2013). Subsequently, this map was enriched with additional markers and used to identify one major phylotype-specific QTL and two broad-spectrum QTLs for resistance to RS (Salgon et al., Citation2017). Two additional intraspecific genetic maps of eggplant, based on two F2 populations derived from crosses between two non-parthenocarpic lines (“LS1934” and “Nakate-Shinkuro”) and a parthenocarpic line “AE-P03” were developed, integrated, and used for comparative analysis with the tomato genome using a set of 326 common markers (Fukuoka et al., Citation2012). The F2 maps were also used to identify QTLs for parthenocarpy, and two contributing QTLs, Cop3.1 and Cop8.1, were mapped onto chromosomes 3 and 8, respectively (); subsequently, Cop8.1 was confirmed in a RIL population (Miyatake et al., Citation2012). However, in all the above-mentioned genetic maps, often the identified QTLs encompassed large genetic regions (cM), corresponding to several Mbs on the physical map, limiting their introgression via MAS due to potential linkage drag. Only recently a fine map of an eggplant semi-dominant Prickle (Pl) gene locus on chromosome 6, causing the absence of prickles, was obtained using a linkage map based on an F2 population derived from a cross between the no-prickly cultivar “Togenashi-senryo-nigo” and the prickly line “LS1934.” A 5-kb deletion within the Pl locus responsible for the no-prickly phenotypes was identified, and primers for detecting the InDel suitable for the MAS of the trait developed (Miyatake et al., Citation2020).

Table 3. Major genes and QTLs identified in eggplant (Solanum melongena).

With the advent of NGS technologies, the development of higher-density genetic linkage maps and the identification of candidate genes have become a reality (Jaganathan et al., Citation2020). In the aforementioned F2 population derived from the intraspecific cross “305E40” × “67/3” the application of RAD-sequencing identified ∼10,000 SNPs and 1,000 InDels, of which >2,000 SNPs were found to be potentially useful for genotyping via a GoldenGate assay (Barchi et al., Citation2011). This resulted in the development of the first post-NGS genetic map of eggplant, which included 415 SNP markers assigned to the 12 eggplant chromosomes. Subsequently, the map was used to locate QTLs for seven traits associated with anthocyanin content (Barchi et al., Citation2012) and 20 fruit yield and morphological traits (Portis et al., Citation2014), fruit qualitative traits (dry matter, sugars, and organic acids), chlorogenic acid, peel anthocyanins and steroidal glycoalkaloids (Toppino et al., Citation2016). The most recent study based on this F2 population made it possible to locate major QTLs affecting response to Fusarium oxysporum and V. dahliae (Barchi et al., Citation2018).

Genome-wide association studies were performed on a set of 191 eggplant accessions comprising a mixture of breeding lines, old varieties, and landraces from Asia and the Mediterranean basin, and genotyped with 384 SNPs (Cericola et al., Citation2014, Portis et al., Citation2015). These studies validated a number of previously identified QTLs affecting anthocyanin pigmentation, as well as fruit and plant morphology; further, due to the wide genetic diversity that existed in the panel of genotypes, several new marker-trait associations were identified. Another association mapping study based on 219 SNPs applied to a set of 377 eggplant accessions identified five SNPs near the SUN and OVATE homologs of tomato, respectively encoding for a protein promoting fruit elongation and a protein playing a negative role in the growth and elongation of fruit (Liu et al., Citation2019b).

An advancement in eggplant genetic map saturation was made in a study aimed at identifying QTLs associated with resistance to Ralstonia pseudosolanacearum. Following a GBS approach, a set of 1,370 SNPs were applied to genotyping 123 DH lines previously obtained from a cross between the susceptible “MM738” and resistant “EG203” lines. The identified QTLs were highly influenced by environmental conditions, but the two most stable QTLs were located on chromosomes 3 and 6 (Salgon et al., Citation2017). Recently, two highly saturated genetic maps of eggplants were reported. One was based on an F2 population (n = 121) of a cross between the eggplant line “1836” and accession of S. linnaeanum. Using SLAF-seq, a map containing 2,122 SNPs was obtained and used to identify 19 QTLs associated with plant and fruit traits (Wei et al., Citation2020). The second map was developed based on the intraspecific RIL population of 163 F7 RILs from a cross between the eggplant breeding lines “305E40” and “67/3.” In this RIL population, the availability of a high-quality genome sequence of the line “67/3” (male parent) and resequencing of the line “305E40” (female parent), as well as a low coverage Illumina sequencing of RILs led to the identification of 7,249 SNPs assigned to the 12 eggplant chromosomes (Barchi et al., Citation2019a). The map, spanning 2169.23 cM, had an average marker distance of 0.4 cM and has been utilized to determine genetic bases of several traits related to anthocyanin content and seed vigor (Toppino et al., Citation2020). Since the fruit of the two parental lines (“305E40” and “67/3”) show different content in several metabolites belonging to the glycoalkaloid, anthocyanin, and polyamine classes, more recently the metabolic profiling of each RIL made it possible the identification of several metabolomic QTLs (mQTLs) associated with their accumulation (Sulli et al., Citation2021).

C. Marker-assisted selection and genomic selection

Genetics and genomics research in eggplant has lagged behind that in other Solanaceae crops, such as tomato, potato, and pepper. Although conventional breeding has resulted in many improved cultivars of eggplant, to date there is no reported example of eggplant varieties developed through the use of MAS. However, the recent availability of a high-quality eggplant genome sequence offers great opportunities for the rapid development of new molecular markers tightly linked to genes and QTLs of interest, which in turn would allow the application of genomic tools to develop new eggplant varieties more efficiently.

The first draft of the eggplant genome sequence, released in 2014 (Hirakawa et al., Citation2014), covered 833.1 Mb (N50 = 64 Kb) spanning 74% of the eggplant genom. This genome sequence, however, was highly fragmented and not anchored to the eggplant chromosomes. Furthermore, the number of predicted genes was 85,446, much larger than the number of genes (∼35,000) annotated in other sequenced diploid Solanaceae genomes. A new eggplant genome sequence, released in 2019 by an Italian Consortium for the RIL male parent “67/3,” was developed by combining Illumina and optical mapping approaches (https://solgenomics.net/organism/Solanum_melongena/genome). The quality of the hybrid assembly was comparable to those of tomato, potato, and pepper (1.22 Gb gapped and 0.92 Gb un-gapped sequence; N50 = 3.59 Mb). The gene annotation, assisted by RNA-Seq, resulted in 34,916 gene models, similar to those in other Solanaceae species, of which 28,425 were anchored. Furthermore, through the resequencing of the RIL female parent “305E40” and a low coverage Illumina sequencing of each RIL, scaffolds were anchored to the 12 eggplant chromosomes (Barchi et al., Citation2019a). Subsequently, based on previous assemblies, a highly contiguous S. melongena reference genome was obtained by using 3 D chromosome conformation (Hi-C) information, resulting in a marked reduction of unanchored genes (Barchi et al., Citation2021).

Recently, a high-quality chromosome-level genome assembly for the eggplant inbred line “HQ-1315” has also been published, which was obtained by a combination of Illumina, Nanopore, 10X genomics sequencing technologies, and Hi-C technology for genome assembly (Wei et al., Citation2020). The sequencing of a QTL affecting fruit length, located on chromosome 3, was performed and the gene Smechr0301963, belonging to the SUN gene family, was predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, 210 linkage markers associated with 71 traits were anchored to the eggplant chromosomes and 26 QTL hotspots were identified.

The recent availability of a high-quality eggplant genome sequence has fostered resequencing studies, which would lead to the development of further marker information and enhancing the genetic mapping of agronomic traits. The first re-sequencing study included a comprehensive structural and functional characterization of seven diverse S. melongena accessions and one accession of the wild species S. incanum L. (Gramazio et al., Citation2019). By comparing the resequencing data with the high-quality reference genome, more than 10 million new polymorphisms were identified, including 1.3 million among the S. melongena accessions and over 9 million between S. melogena and S. incanum. This highlighted the narrow genetic diversity within the domesticated eggplant, and that introgression from the wild relatives could broaden the genetic basis of cultivated eggplant significantly. In another study, a draft genome sequence of the scarlet eggplant S. aethiopicum was published and 34,906 protein-coding genes were annotated (Song et al., Citation2019). In this study, resequencing of 65 S. aethiopicum and S. anguivi accessions resulted in the identification of more than 18 million SNPs, of which ∼34,000 were located within regions of disease resistance genes. Further, a pan-genome analysis of S. aethiopicum accessions identified 51,351 protein-coding genes, of which 7,069 were missing from the cultivated eggplant reference genome.

The high-throughput sequencing technologies make large amounts of data available, which when integrated with phenotypic information would facilitate the identification of traits and regions for pyramiding desirable alleles from both cultivated and wild relatives via MAS and also GS. The use of GS for complex traits would allow incorporating a large number of markers to model the performance of a genotype, thus avoiding the risk of losing contributions of multiple small-effect genes.

D. Future outlook

Despite considerable progress made in the last decade at identifying genetic bases of traits of agronomic interest, in the future it is desirable to offer an alternative to traditional linkage or association mapping populations for increasing the precision of QTL mapping. In this regard, a good example is the recent development of a multi-parent advanced generation intercrosses (MAGIC) population obtained by crossing seven S. melongena accessions, chosen to maximize the phenotypic, genetic, and geographic variation of the material in cultivation, with a single S. incanum accession (Gramazio et al., Citation2019).

At present, eggplant wild relatives are poorly represented in genebanks and, with a few exceptions (Rotino et al., Citation2014; Liu et al., Citation2015), breeders have largely overlooked their potential value for eggplant breeding. It is necessary, therefore, to increase the number of wild eggplant accessions in the genebanks and conduct accurate genotyping and phenotyping for better evaluation of their potential use in developing new eggplant cultivars (Barchi et al., Citation2019b). In this respect, it is important to note the development of pre-breeding material obtained by crossing the cultivated eggplant with wild relatives (Kouassi et al., Citation2016), the development of an eggplant introgression line (IL) population using as donor parent S. incanum L. (Gramazio et al., Citation2017), and the recent characterization of a set of ILs carrying a single marker-defined chromosomal segment (Mangino et al., Citation2020). Research should be also focused on conducting phenotypic characterization of hybrids between eggplant and wild relatives from the primary and secondary gene pools and the evaluation of their heterotic ability for various yield, quality, and disease resistance traits (Kaushik et al., Citation2016, Citation2018).

Additional studies should be conducted to identify suitable rootstocks for eggplant grafting, so to improve the quality of eggplant cultivation by providing resistance/tolerance to soil pathogens and by inducing vigorous growth of the scions (Bletsos et al., Citation2003). Due to the tolerance to abiotic and biotic stresses, eggplant wild relatives have been used for eggplant grafting, such as S. torvum, S. sisymbriifolium, or the interspecific hybrids S. melongena × S. aethiopicum (Gisbert et al., Citation2011; Moncada et al., Citation2013). Analysis of the genome-wide changes induced by DNA methylation in eggplants grafted onto two interspecific rootstocks revealed that similar to heterotic hybrids, increased vigor of the scion is associated with changes in gene expression and reduced DNA methylation in CHH (Cerruti et al., Citation2021). In tomato, differentially expressed genes were detected between the transcriptomes of heterografts and homografts plants (Wang et al., Citation2019a), and future studies in eggplant should be aimed at deciphering the molecular interactions between scion and rootstocks.

At present large part of the eggplant production relies on non-hybrid varieties, however, farmers’ interest and preference in eggplant hybrids have remarkably increased in the last several years. F1 hybrids are available on the market, but further development of locally adapted hybrids with preferred fruit traits and having high yield and adaptation is required and will be facilitated by the development of eggplant MS lines (Kumar et al., Citation2020). Previous studies reported examples of eggplant MS caused by recessive nuclear genes (Phatak et al., Citation1991) as well as CMS by utilizing the cytoplasm of wild Solanum species (Khan and Isshiki, Citation2010; Citation2011; Citation2016). Furthermore, two independent dominant fertility restorer (Rf) genes have been discovered, sequenced, and found tightly linked to a SCAR marker (Hasnunnahar et al., Citation2012). More recently, through an RNA-seq approach, further genes and pathways related to MS in eggplant have been identified (Yang et al., Citation2018b; Li et al., Citation2019a). However, the molecular mechanism of MS is not yet fully understood and further effort is needed to integrate MS into various eggplant genetic backgrounds through MAS.

Parthenocarpy represents a key trait for eggplant breeding, as it makes it possible to obtain seedless fruit and to overcome the problem of low fruit yield under unfavorable environmental conditions. The identification of a parthenocarpic spontaneous mutant plant has been reported (Miyatake et al., Citation2020) that was not associated with Cop8.1 QTL for parthenocarpy (Miyatake et al., Citation2012). Map-based cloning of the gene revealed that Pad-1 is involved in auxin homeostasis during ovary development and the mutated allele of the gene induces parthenocarpy. Furthermore, the suppression of its orthologous genes induced parthenocarpy also in tomato and pepper (Matsuo et al., Citation2020). This result is of great interest for the future development of parthenocarpic genotypes in cultivated eggplant varieties and should be confirmed on a wider number of genotypes also through the application of the recently available biotechnological approaches, such as gene knock-out based on the CRISPR/Cas9 technique (Saini and Kaushik, Citation2019). The latter has been recently adopted to edit polyphenol oxidase genes involved in the browning of the fruit flesh after cutting (Maioli et al., Citation2020). Further improvements of the technique, as well as optimized protocols for eggplant in vitro regeneration, are needed to apply this approach commonly for eggplant improvement.

V. Lettuce

Lettuce (Lactuca sativa L.), a self-fertilized diploid species (2n = 2x = 18) from the Asteraceae (Compositae) family, is grown mainly in moderate climates in many countries around the world. Lettuce leaves are frequently consumed raw as a salad or a sandwich filling, although in some cuisines, the leaves or stems are also cooked, pickled, dried, or stir-fried. The world’s lettuce production totaled 27.3 million metric tons in 2018 (the most of any leafy vegetable), with the majority being produced in China (57.0%), U.S. (13.5%), India (4.5%), Spain (3.4%), and Italy (2.8%) (FAOSTAT, Citation2020).

Lactuca sativa is the only cultivated species from the genus Lactuca that includes ∼100 species, most of which are indigenous to Asia and Africa. Results of RNA sequencing indicate that the single domestication event of lettuce occurred ∼10,800 years ago (YAGO) in the Fertile Crescent, and L. serriola (prickly lettuce) is the progenitor of the cultivated lettuce (Zhang et al., Citation2017). Domestication was marked by the loss of seed shattering, probably caused by spontaneous mutation(s) (Wei et al., Citation2021). Besides L. serriola, other closely related species L. aculeata, L. altaica, L. azerbaijanica, L. dregeana, L. georgica, and L. scarioloides are considered to be in the primary gene pool for the domesticated lettuce. Lactuca saligna (willow-leaf lettuce from secondary gene pool) and L. virosa (bitter lettuce from tertiary gene pool) are also sexually compatible with the cultivated lettuce, however, the viability of the resulting offspring is limited (Zohary, Citation1991; Lebeda et al., Citation2007; Citation2009).

Lettuce cultivars display extensive variation in leaf color, shape, and texture. Based on the shape and size of the head, the shape, size, and texture of leaves, stem length, and seed size, lettuce cultivars are commonly divided into several horticultural types: crisphead (frequently split into iceberg and Batavia), butterhead, romaine (cos), leaf (cutting), Latin (grassé), stem (stalk), and oilseed. Leaves of all types, but stem and oilseed, are typically eaten raw; the stem type is cultivated mainly for edible stems that are eaten raw or cooked, and the oilseed type is used for the production of cooking oil from its relatively large seeds (Simko et al., Citation2014a). Since the 1990s, a dramatic expansion has been observed in the use of lettuce for fresh-cut processing and extended storage of the product in modified atmosphere packaging (MAP) (Glaser et al., Citation2001). Lettuce plants used for fresh cut (usually romaine, iceberg, or leaf) are harvested either as mature rosettes, which are then cut or shredded before packaging or whole leaves of ∼30-day-old seedlings that are harvested and packaged as baby‐leaf or spring‐mix salad (Hayes and Simko, Citation2016).

A. Genetic resources

The divergence of stem type from its ancestral cultivated lettuce occurred ∼1,900 YAGO and those of butterhead, crisphead, and romaine ∼500 YAGO (Zhang et al., Citation2017). Currently, only a fraction of the genetic diversity found in lettuce is exploited in breeding programs, where the majority of new cultivars are derived from crossing elite, closely related germplasm within a certain type. Pedigree and DNA fingerprinting show that modern lettuce cultivars, particularly within iceberg and to a lesser extent within romaine type, are closely related, whereas leaf type cultivars have the highest genetic diversity (Mikel, Citation2007; Citation2013; Simko and Hu, Citation2008; Simko, Citation2009; Rauscher and Simko, Citation2013; Zhang et al., Citation2017). To increase genetic diversity and introduce novel loci into the cultivated lettuce, hybridization of elite cultivars with old heirloom varieties and related wild species has been common in pre-breeding programs. Such diversification of the primary gene pool may help developing cultivars with improved resistance or tolerance to biotic and abiotic stresses (Adhikari et al., Citation2019), better nutritional (Simko, Citation2019), and post-harvest qualities (Hayes and Simko, Citation2016; Damerum et al., Citation2020), water and nutrient use efficiencies (Simko, Citation2020a; Macias-González et al., Citation2021), and potentially cultivars that are less hospitable to human enteric pathogens (Simko et al., Citation2015b; Melotto et al., Citation2020).

The generation and phenotypic characterization of mapping populations is usually a lengthy process. Many genes of interest have been mapped in lettuce using a diverse set of populations, including F2 and early BC generations, RILs, BILs, NILs, and diversity panels. The majority of RIL populations used to identify QTLs have been based on intraspecific crosses between two different types of the cultivated lettuce, for example, iceberg × Batavia (Hayes et al., Citation2014a; Simko et al., Citation2015a), iceberg × butterhead (Hayashi et al., Citation2008), iceberg × romaine (Simko et al., Citation2009; Simko et al., Citation2011), Latin × Batavia (Mamo et al., Citation2019; Sandoya et al., Citation2019), and Batavia × leaf (Simko et al., Citation2013). Less frequently, however, mapping populations have also been developed from crosses between cultivars of the same type, for example, Batavia × Batavia (Sandoya et al., Citation2019) and iceberg × iceberg (Aruga et al., Citation2012; Jenni et al., Citation2013; Macias-González et al., Citation2019), or between two distinct genotypes of the wild species L. serriola (Bell et al., Citation2015). Various interspecific mapping populations have also been developed, including crosses between iceberg type and L. serriola (probably the most frequently used population in mapping studies) (Argyris et al., Citation2005; Simko et al., Citation2009; Truco et al., Citation2013), butterhead and L. serriola (Uwimana et al., Citation2012), and leaf type and L. saligna (Jeuken and Lindhout, Citation2002; Jeuken et al., Citation2008). Association studies have been performed on diversity panels, which included an assortment of horticultural types. Earlier studies used fewer accessions (∼100—300), genotyped with markers relatively sparsely distributed throughout the genome (Kwon et al., Citation2013; Lu et al., Citation2014; Walley et al., Citation2017) or located in only specific genomic regions (Simko et al., Citation2009; Simko et al., Citation2011; Inderbitzin et al., Citation2019). More recently a diversity panel of almost 500 accessions genotyped with over 50,000 markers was used for GWAS of traits related to plant development, biomass production, post-harvest quality, and the traits stability (Sthapit Kandel et al., Citation2020).

The first molecular linkage map of lettuce was constructed based on 66 F2 plants and 41 RFLP markers (Landry et al., Citation1987). The markers were distributed in nine LGs and covered ∼400 cM map distance of the lettuce genome. Progress in marker technologies facilitated the construction of higher-density linkage maps of lettuce and more complete coverage of the genome (Galeano et al., Citation2014). For example, ∼14,000 SNP markers were used to construct a high-density lettuce genetic map, covering 1,585 cM of the genome (Truco et al., Citation2013). This genetic map was developed using 213 F7:8 RILs derived from interspecific cross iceberg cv. Salinas × L. serriola accession US96UC23. The same RIL population was also used to develop a consensus genetic map of lettuce, integrating information from multiple populations (Truco et al., Citation2007), and has become the lettuce reference map (Simko et al., Citation2009; Citation2013; Rauscher and Simko, Citation2013; Hayes et al., Citation2014a).

A reference assembly of the lettuce genome (iceberg cv. Salinas) has been generated and validated genetically (Reyes-Chin-Wo et al., Citation2017). The total length of the assembly is 2.38 Gb, covering ∼88% of the L. sativa genome and containing almost 39,000 annotated genes. The above-mentioned interspecific RIL population has been used for validation and anchoring of the assembly to the nine lettuce LGs. Sequencing revealed a highly repetitive nature of the lettuce genome due to a triplication that occurred around 66 million years ago (MYA) (Reyes-Chin-Wo et al., Citation2017). Whole-genome resequencing of 445 accessions from 13 Lactuca species has been used to identify lettuce domestication history, the genetic architecture of domestication traits, and detection of the introgression regions from the wild lettuce L. serriola into resistance gene clusters (Wei et al., Citation2021). In addition, more than 90 genomes of the cultivated lettuce and its wild relatives have been sequenced and assembled to some degree (Verwaaijen et al., Citation2018; Inderbitzin et al., Citation2019) (https://www.ncbi.nlm.nih.gov/bioproject/478460, https://www.ncbi.nlm.nih.gov/bioproject/412928). Whole-transcriptome analysis, with total RNA sequencing of 240 wild and cultivated lettuce accessions, has been applied to identify candidate loci involved in flavonoid biosynthesis (Zhang et al., Citation2017), and transcriptome analysis of a single cultivar led to the identification of a network of genes involved in interaction with soilborne fungus Rhizoctonia solani (Verwaaijen et al., Citation2019) and fungal necrotroph Botrytis cinerea (De Cremer et al., Citation2013).

B. Mapped genes and QTLs

Genes and QTLs for a large number of traits have been mapped in the lettuce genome using intraspecific and interspecific mapping populations. The majority of traits were related to resistance or tolerance to biotic and abiotic factors, and various plant characteristics and product qualities ().

Table 4. Major genes and QTLs positioned on the molecular linkage map of lettuce (Lactuca sativa).

Lettuce downy mildew (DM), caused by an oomycete Bremia lactucae, is economically the most important disease of the cultivated lettuce worldwide, and therefore numerous studies have focused on resistance to this disease. The disease can affect lettuce at any developmental stage, from young seedlings to mature plants. Over 50 Dm genes and resistance factors have been described, which generally provide complete resistance against specific isolates of B. lactucae (Simko et al., Citation2015a; Parra et al., Citation2016). Lettuce resistance to DM may also be polygenic (quantitative) in nature, with phenotypic reactions ranging from partial to near-complete resistance. A large number of loci for resistance to DM have been mapped for both race-specific and quantitative resistance phenotypes. Although the resistance loci have been located on all nine LGs except LG 6 (Paran et al., Citation1991; Paran and Michelmore, Citation1993; Maisonneuve et al., Citation1994; Jeuken and Lindhout, Citation2002; Jeuken et al., Citation2008; McHale et al., Citation2009; Zhang et al., Citation2009a; 2009Citationb; Simko et al., Citation2013; Citation2015a; Parra et al., Citation2021), more than half of the loci were in major resistance clusters on LGs 1, 2 and 4 (Simko, Citation2013; Parra et al., Citation2016). These major resistance gene clusters span several Mb and frequently contain NBS-LRR proteins (Meyers et al., Citation1998a; Citation1998b; Parra et al., Citation2016).

Other lettuce diseases caused by fungi or oomycetes, for which resistance genes have been mapped, are powdery mildew (caused by Golovinomyces bolayi, formerly known as G. cichoracearum), anthracnose (Microdochium panattoniana), Verticillium wilt (V. dahliae), lettuce drop (Sclerotinia minor), Fusarium wilt (F. oxysporum f. sp. lactucae), and root downy mildew (Plasmopara lactucae-radicis). Genomic locations have been identified for single resistance genes against V. dahliae race 1—“Vr1” (Hayes et al., Citation2011; Inderbitzin et al., Citation2019; Mamo et al., Citation2019), F. oxysporum race 2—“RRD2” and “qFOL1.2” (Aruga et al., Citation2012, Seki et al., Citation2021), and P. lactucae-radicis—“plr” (Kesseli et al., Citation1993), and for QTLs involved in resistance against V. dahliae race 2 (Sandoya et al., Citation2021), S. minor (Mamo et al., Citation2019), G. bolayi (Simko et al., Citation2014b), F. oxysporum race 1 (Michelmore et al., Citation2010), and M. panattoniana (McHale et al., Citation2009).

Lettuce loci for resistance against bacterial pathogens have been identified for corky root (Sphingomonas suberifaciens) strain CA1—“cor” (Moreno-Vázquez et al., Citation2003), and bacterial leaf spot (Xanthomonas hortorum pv. lactucae formerly known as X. campestris pv. vitians) including both race-specific resistance genes “Xar1” (Hayes et al., Citation2014b) and “Xcvr” (Wang et al., Citation2016a) and QTLs for quantitative resistance (Lu et al., Citation2014; Sandoya et al., Citation2019). Loci for resistance to viral diseases have been identified for lettuce mosaic virus (LMV)—“mo1” (Nicaise et al., Citation2003) and Mo-2 (McHale et al., Citation2009), lettuce dieback caused by tomato bushy stunt virus (TBSV), and lettuce necrotic stunt virus (LNSV)—“Tvr1” (Grube et al., Citation2005; Simko et al., Citation2009), turnip mosaic virus (TuMV)—“Tu” (Montesclaros et al., Citation1997), and big vein caused by mirafiori lettuce big-vein virus (MLBVV)—several QTLs (Hayes et al., Citation2010; Michelmore, Citation2010). Loci for resistance to insects have also been identified for lettuce root aphid (Pemphigus bursarius)—“Ra” (Wroblewski et al., Citation2007), currant-lettuce aphid (Nasonovia ribisnigri) (Walley et al., Citation2017), and leafminer (Liriomyza trifolii) (Kandel et al., Citation2021).

Several studies have been conducted to identify loci associated with post-harvest lettuce qualities, including the rate of salad deterioration in MAP (Hayes et al., Citation2014a; Simko et al., Citation2018; Sthapit Kandel et al., Citation2020), pinking, browning, overall discoloration (Atkinson et al., Citation2013), and shelf life (Zhang et al., Citation2007). In addition, many loci have been identified contributing to plant phenotypic appearance, and overall performance, including content of chlorophylls, anthocyanins, carotenoids, phenolics, and antioxidants (Waycott et al., Citation1999; Zhang et al., Citation2007; Citation2017; Hayashi et al., Citation2012; Kwon et al., Citation2013; Damerum et al., Citation2015; Simko et al., Citation2016; Mamo et al., Citation2019), flower and seed color (Waycott et al., Citation1999; Kwon et al., Citation2013; Simko et al., Citation2013; Wang et al., Citation2016a), plant developmental rate, bolting and flowering (Michelmore, Citation2009; Hartman et al., Citation2012; Citation2013; Kwon et al., Citation2013; Mamo et al., Citation2019; Han et al., Citation2021; Rosental et al., Citation2021), leaf morphology (Michelmore, Citation2009; Kwon et al., Citation2013; Bell et al., Citation2015), spines (Wei et al., Citation2021) and biophysical properties (Zhang et al., Citation2007), plant dwarf phenotype (Waycott et al., Citation1999), biomass production and yield (Hartman et al., Citation2012; Uwimana et al., Citation2012; Sthapit Kandel et al., Citation2020), hybrid fitness (Hartman et al., Citation2013), seed shattering (Wei et al., Citation2021), germination and longevity (Argyris et al., Citation2005; Hayashi et al., Citation2008; Schwember and Bradford, Citation2010a; Citation2010b; Hartman et al., Citation2012), root architecture (Johnson et al., Citation2000), plant adaptability to drought and salinity (Hartman et al., Citation2014; Kumar et al., Citation2021), nitrogen and water use efficiency (Kerbiriou et al., Citation2016; Macias-González et al., Citation2021), sensitivity to triforine containing pesticides (Simko et al., Citation2011), and resistance to tipburn (Jenni et al., Citation2013; Macias-González et al., Citation2019).

C. Marker-assisted selection and genomic selection

Lettuce has benefited from the application of new technologies to identify markers linked to genes or QTLs of interest and to develop assays for MAS in breeding programs. Because lettuce cultivars of different horticultural types are genetically separated into distinct subpopulations (Simko and Hu, Citation2008; Kwon et al., Citation2013; Rauscher and Simko, Citation2013; Zhang et al., Citation2017), a substantial consideration should be given to developing MAS assays that work across all types of lettuce, not only within specific types from which loci and markers were identified.

Publicly available molecular marker-assays have been developed to detect race-specific alleles associated with resistance to downy mildew—“Dm1,” “Dm3,” “Dm17,” and “Dm18” (Paran and Michelmore, Citation1993; Maisonneuve et al., Citation1994), corky root strain CA1—“cor” (Moreno-Vázquez et al., Citation2003; Simko, Citation2013), LMV—“mo-11” and “mo-12” (Nicaise et al., Citation2003; Simko, Citation2013), Fusarium wilt race 2—“RRD2” and “qFOL1.2” (Aruga et al., Citation2012, Seki et al., Citation2021), Verticillium wilt race 1—“Vr1” (Inderbitzin et al., Citation2019), and lettuce dieback—“Tvr1” (Simko et al., Citation2009). Assays also have been developed for SNP alleles which would reliably predict the rate of salad deterioration in MAP—“qSL4” (Simko et al., Citation2018). Additional marker assays developed by the private sector for MAS in lettuce are not publicly available due to competing interests among companies (Simko, Citation2013) or patent protection (Thabuis et al., Citation2013; Walley et al., Citation2017). As to genomic selection in lettuce, limited testing has been performed using data for polygenic resistance to DM and a single gene (“qSL4”) associated with salad deterioration (Hadasch et al., Citation2016). The results indicated that while for resistance to DM the genomic prediction model outperformed the MAS model, the predictive ability of the genomic prediction model for salad deterioration was significantly lower than that for the model based on QTL-linked markers (Hadasch et al., Citation2016).

Besides using molecular markers for gene mapping, identifying population structure, and MAS assays, molecular markers have also been applied for lettuce cultivar fingerprinting (Rauscher and Simko, Citation2013; Zhou et al., Citation2019) and hybrids identification (Lebeda et al., Citation2007, Patella et al., Citation2019a). Assays based on molecular markers could also be used to detect adulteration (Simko, Citation2016), though this type of product analysis has not yet been reported in lettuce.

D. Future outlook

As our understanding of genes involved in lettuce coloration improves, the development of lettuce germplasm with customized leaf color (Simko, Citation2020b) will become possible. A growing number of resistance genes from wild Lactuca species (Lebeda et al., Citation2014) will be introgressed into cultivated lettuce. Further detailed study of the lettuce genome is expected to lead to faster development of MAS assays, not only for monogenic traits but also traits that are inherited polygenically. As to DM, the most important disease of lettuce, molecular markers have been identified for both major resistance Dm-genes and QTLs. To develop cultivars with improved and more durable resistance to DM, molecular markers should be used to facilitate the pyramiding of major resistance genes as well as QTLs in individual genotypes. An alternative approach to developing lettuce cultivars with effective resistance to DM (and/or other pathogens) is the use of host-induced gene silencing (HIGS) approach, in which small interfering RNAs (siRNAs) produced in the host plant move into the pathogen where they silence its vital genes. In a recent study, the stable transgenic lettuce plants expressing siRNAs targeting vital genes of B. lactucae successfully reduced both growths of the pathogen and inhibited its sporulation (Govindarajulu et al., Citation2015). Studies have also been performed to modify the lettuce genome using the CRISPR/Cas9 technology (Bertier et al., Citation2018; Zhang et al., Citation2018a), which could be applied in the future to develop cultivars with more durable resistances. A combination of novel genome editing tools with the recent progress in automated plant phenotyping (Araus and Cairns, Citation2014; Simko et al., Citation2017) is expected to speed up the process of developing lettuce cultivars and breeding lines with desired sets of traits.

One of the most challenging issues currently facing the lettuce industry is minimizing foodborne disease outbreaks associated with lettuce products. Studies have been performed to identify plant factors associated with the growth and survival of human enteric pathogens, such as Escherichia coli and Salmonella enterica, on lettuce (Simko et al., Citation2015b; Jacob and Melotto, Citation2019). It is anticipated that molecular markers linked to heritable traits associated with the reduction of harmful human pathogen cells on plants will be used to develop cultivars with a reduced risk of contamination while maintaining lettuce quality and overall performance. Food safety can also be improved by supporting plant phyllosphere that minimizes survival of human enteric pathogens. Because the plant genotype plays a vital role in the composition of microbial communities, more comprehensive knowledge of the lettuce genome will likely allow dissection of a complex relationship between plant phenotypes, microbial communities of phyllosphere, and rhizosphere, and pathogens and pests living in the lettuce habitat.

VI. Spinach

The cultivated spinach (Spinacia oleracea L.), a member of the amaranth family (Amaranthaceae), is an annual, diploid (2n = 2x = 12), dioecious, and highly heterozygous and heterogeneous species. The worldwide annual production of this leafy vegetable is ∼30.1 million metric tons, with China producing ∼91% of the total (FAOSTAT, Citation2020). The United States is the second-largest spinach producer, with an annual production of 0.44 million metric tons. Mild summer and fall temperatures in California and mild winter temperatures in Arizona assure year-round fresh spinach production, contributing to ∼90% of the U.S. production, with New Jersey and Texas being the other two prominent spinach producing states (USDA-NASS, Citation2019).

Spinach is a highly nutritious leafy green vegetable containing high amounts of proteins, vitamins, minerals, and flavonoids, and is low in calories (Cao et al., Citation1996; Howard et al., Citation2002). Further, spinach is a rich source of iron, lutein, folate, and carotenoids (Bunea et al., Citation2008; USDA-ARS, Citation2020), and with a high level of antioxidants and phenolic compounds (Howard et al., Citation2002; Yosefi et al., Citation2010) it is considered a superior food for human health. Spinach is consumed fresh or cooked and is often mixed with other foods. Due to its presumed health benefits, the demand for FM spinach consumption has doubled in the U.S. during the past decade (USDA-NASS, Citation2016).

Morphological variation in the leaf shape and structure classifies spinach as savoy, semi-savoy, and flat (or smooth). Savoy cultivars have wrinkles on the leaves, semi-savoy have reduced wrinkles, while the flat cultivars lack wrinkles. Spinach plants are sensitive to photoperiod and temperature, bolting earlier at long days and high summer temperatures (Morelock and Correll, Citation2007). Spinach is a dioecious species; however, plant sex determination is not visually possible at the early stages of plant development. Occasionally, also monoecious plants with both pistillate and staminate flowers on the same stock may appear (Janick and Stevenson, Citation1955; Iizuka and Janick, Citation1971).

Spinach is native to central and southwest Asia (Rubatzky and Yamaguchi, Citation1997). Its domestication occurred in the current Iran area and was later introduced to eastern Asia and western Europe. The two wild species S. turkestanica Iljin. and S. tetrandra ex M. Bieb. are found in Central Asia surrounding the Caspian Sea (northern Iran), which further supports the origin and domestication of the cultivated spinach (Ribera et al., Citation2020a). All of the three known Spinacia species are inter-fertile. Phylogenetic analysis of the three species following transcriptome sequence analysis revealed that S. oleracea is genetically closer to S. turkestanica than S. tetrandra (Xu et al., Citation2017a). This study further indicated that S. turkestanica was the wild ancestor of the cultivated species. Recent genetic analysis of the cultivated and two wild species demonstrated a higher genetic variation within S. turkestanica than S. tetrandra and confirmed that the S. turkestanica accessions were genetically closer to S. oleracea landraces (Ribera et al., Citation2020a).

Worldwide increased demand for spinach due to its high nutrition and health benefitting compounds has raised attention to genetics and breeding studies of this vegetable. A wealth of spinach genomic resources has been developed in recent years, including reference genome assembly, transcriptome sequences, and genotype data for the germplasm panel. Because the downy mildew disease, caused by obligate oomycete pathogen Peronospora effusa (Pfs), is continually and severely affecting spinach production, substantial attention is given to use host plant resistance to improve disease management strategies. Considerable interest in spinach biofortification exists due to a high content of nutrient and health benefitting compounds in spinach compared to other vegetable crops. Expansion of spinach production into the new area using new production practices (e.g., hydroponics and vertical farming) may ensure a rapid and continuous supply of locally grown, fresh green vegetables in the metropolitan regions, and possibly lead to an additional increase in the consumption of this nutrient-rich vegetable.

A. Genetic resources

More than 2,000 spinach accessions are maintained in the International Spinach Database (https://ecpgr.cgn.wur.nl/LVintro/spinach/). Around 400 accessions, mainly comprising S. oleracea cultivars, landraces, and breeding lines, are available at the USDA-ARS North Central Regional Plant Introduction Station in Ames, Iowa. Recent germplasm collection expeditions of the center of Genetic variation, the Netherlands (CGN) have added 89 S. turkestanica and 49 S. tetrandra accessions to the already existing germplasm in genebanks (van Treuren et al., Citation2020; Ribera et al., Citation2020b). Genetic diversity of spinach germplasm, assessed by SSR markers (Khattak et al., Citation2007; Kuwahara et al., Citation2014) and target region amplification polymorphism (TRAP) markers (Hu et al., Citation2007), grouped the accessions mainly according to their geographic origins. These studies indicated a higher genetic diversity among western Asia accessions and a narrow genetic divergence between the wild and cultivated Spinacia species. Subsequently, SNP markers were used to characterize the genetic diversity of 343 spinach accessions, including 268 USDA germplasm, 45 commercial cultivars, and 30 University of Arkansas breeding genotypes (Shi et al., Citation2017). This study reported a broad genetic diversity among the spinach accessions divided into subpopulation structures according to their geographic origin in Asia, Europe, and America. A recent genetic diversity assessment of the cultivated and wild spinach accessions revealed that S. turkestanica was evolutionary closer to S. oleracea than S. tetrandra (Ribera et al., Citation2020a). Most recently, SNP genotyping of a panel of 76 S. turkestanica, 16 S. oleracea, and 4 S. tetrandra accessions revealed a substantial differentiation of S. tetrandra genotypes from the S. oleracea. Based on the high genetic similarity between S. turkestanica and S. oleracea species it was concluded that S. turkestanica is the immediate progenitor of cultivated spinach.

Spinach germplasm resources, especially the wild Spinacia accessions, are excellent sources for spinach breeding (van Treuren et al., Citation2020). The wild spinach species have been widely utilized to identify resistance against the spinach DM pathogen and incorporate resistant alleles into commercial cultivars. The wild spinach germplasm has been evaluated for numerous desirable traits, including nutrient contents, oxalate content (Mou, Citation2008b), leaf miner resistance (Mou, Citation2008a), leaf spot resistance (Mou et al., Citation2008), and DM resistance (Brandenberger et al., Citation1992). The presence of a wide range of phenotypic variation for several traits indicates the potential value of the available genetic variation for future spinach crop improvement.

The size of the spinach genome is estimated to be ∼989 Mb (Arumuganathan and Earle, Citation1991). The first genome sequence of spinach was recently assembled (498 Mb) and annotated (Dohm et al., Citation2014; Minoche et al., Citation2015). Additional transcriptome sequencing of all three Spinacia species (S. oleracea, S. turkestanica, and S. tetrandra) improved the annotation and gene ontology (GO) assignments (Xu et al., Citation2015). Currently, the whole genome sequence assemblies are available for two spinach accessions, Sp75 and Viroflay. The genome of inbred spinach line Sp75 was assembled to 996 Mb and annotated using the whole genome shotgun approach combined with BioNano Genomics optical maps. The scaffolds were anchored using a high-density genetic map (Xu et al., Citation2017a). The six LGs corresponding to six chromosomes covered 463.4 Mb, constituting 47% of the assembled genome. The transcriptome of 120 cultivated and wild Spinacia accessions were sequenced and annotated, predicting 25,495 protein-coding genes, including 139 NBS-LRR genes known to be involved in plant disease resistance (Xu et al., Citation2017a). The study provided novel information on SNP variants and gene expression profiles and insights into spinach evolution and domestication-related traits (Xu et al., Citation2017a; Collins et al., Citation2019). Furthermore, genome assembly of spinach cultivar Viroflay was recently completed using long-read sequencing technology (https://phytozome-next.jgi.doe.gov/info/Soleracea_Spov3) (Hulse-Kemp et al., Citation2021). The genome assembly comprises 913.5 Mb, with the six main pseudomolecules representing 745 Mb (81.6% of the genome). Genome annotation predicted 34,877 genes in spinach, of which 1,004 were annotated as the disease resistance genes. A complete report of the Viroflay genome is expected to be available shortly. In addition, the genome of 30 spinach cultivars and the near-isogenic lines (NIL) were sequenced at a depth of 30× using the paired-end Illumina approach (Bhattarai et al., Citation2021a), and 480 USDA accessions and commercial cultivars have been re-sequenced at the 10× depth (Shi et al., Citation2019). Efforts are underway to generate reference genomes for wild spinach S. turkestanica and S. tetrandra accessions and re-sequence the wild Spinacia accessions (Ribera et al., Citation2020a). The available reference genome assembly for two accessions and the resequencing data for hundreds of accessions in recent years has facilitated studying and characterizing the genetic and molecular basis of important spinach traits. These new data will add to the existing genomic resources and platforms that can be used to develop effective molecular breeding strategies for spinach.

B. Mapped genes and QTLs

Limited molecular and trait mapping research was performed in spinach until recently, compared to other vegetable crops. Previous molecular genetic studies mainly focused on marker development, genetic diversity studies, genetic mapping, and mapping of disease resistance and sex determination traits. Downy mildew disease caused P. effusa makes spinach unmarketable, thus making DM a major threat to sustainable baby leaf spinach production. The disease can have a particularly devastating effect on organic production that makes up half of the total baby leaf production. Spinach resistance to P. effusa is controlled by genes and loci with both qualitative and quantitative effects (Brandenberger, Citation1991; Irish et al., Citation2003; Correll et al., Citation2011). The identification, mapping, and introgression of major genes are the primary focus of DM resistance breeding in spinach (). Six RPF loci, each providing resistance to multiple races, were hypothesized to provide resistance to all known races of P. effusa (Correll et al., Citation2011). Initial mapping of the RPF1 locus identified Dm1 marker linked at 1.7 cM on chromosome 3 (Irish et al., Citation2008). Later, the 5B14r marker designed from the BAC-end sequences derived putative resistant gene analog (RGA) was reported to lie 1.7 cM from the RPF1 locus and co-segregating with Dm1 marker (Feng et al., Citation2015). Three RPF loci (RPF1, RPF2, and RPF3) were mapped to a 1.5 Mb region of chromosome 3 from three segregating populations and several co-dominant PCR-based diagnostic markers were identified that could distinguish these loci (Feng et al., Citation2018a). Based on disease reactions, a genotype containing RPF1, RPF2, and RPF3 loci would be resistant to races 1-16 of P. effusa (Feng et al., Citation2018b). However, such introgression in a single line has not yet been achieved, as the three RPF loci detected in three different genotypes are either very closely linked or are allelic.

Table 5. Major genes and QTLs positioned on the molecular linkage map of spinach (Spinacia oleracea).

Annotation of spinach genome identified a total of 139 NBS-LRR genes that are involved in resistance against pathogens, of which five genes likely to be involved in resistance against DM disease were predicted to be located close to the Dm1 marker region (Xu et al., Citation2017a). The RPF1 locus location was narrowed down to ∼0.37—1.12 Mb region, with three candidate genes potentially controlling the resistance (She et al., Citation2018). Multi-parent cross populations screened with race 13 of P. effusa identified resistance-associated SNP markers located within 0.39—1.20 Mb of chromosome 3 containing the three RPF resistance loci (Bhattarai et al., Citation2020a). An association analysis performed on another population identified six SNP markers on chromosome 3 to be located within 0.66—1.23 Mb, providing resistance against race 16 of P. effusa (Bhattarai, Citation2019; Bhattarai et al., Citation2021b). All major resistance loci mapped so far have been detected only on the proximal end of chromosome 3. Besides major resistance genes, minor genes and QTLs have been reported following GWAS analysis in the USDA germplasm panel screened with natural pathogen populations in the field condition across years and locations (Bhattarai, Citation2019; Bhattarai et al., Citation2020d). Recently, RNA-seq analysis of resistant and susceptible cultivars inoculated with P. effusa identified potential genes associated with resistance and provided an insight into the molecular mechanism of resistance control (Kandel et al., Citation2020). Additional RNA-seq projects are in progress to elucidate the host-pathogen interactions using different races of P. effusa. These investigations aim to improve our understanding of the genetic mechanism, genes and pathways underlying resistance to P. effusa and to develop improved management strategies against downy mildew disease.

Spinach is primarily a dioecious species, but monoecious lines are infrequently found with varying proportions of male, female and hermaphrodite flowers (Janick and Stevenson, Citation1955; Onodera et al., Citation2008). The first framework genetic map assigned the sex determination locus to the LG 3, located at 1.9 cM from the SSR marker “SO4” (Khattak et al., Citation2006). A major gene for the monoecy was mapped to a 13.4 cM region around the Y locus, positioning the monoecious gene (Xm) to 4.3 cM from the SO4 marker (Onodera et al., Citation2011). A later study confirmed that the monoecious gene (Xm) is not allelic but linked to the dioecious locus (X/Y), and identified markers closely linked to the gene (Yamamoto et al., Citation2014). In contrast to earlier reports, more recent studies mapped the dioecious sex determination locus (X/Y) to two regions (66.98—69.72 and 75.48—92.96 cM) on LG 4 (Qian et al., Citation2017). A follow-up study positioned the male-specific locus to the 21 Kb region (58.76—58.78 Mb) on chromosome 4 and described a KASP-based marker assay (SponR) that can distinguish the XX, XY, and YY plants (She et al., Citation2021). Markers developed to distinguish sex forms in spinach allow early selection of desirable plants, thus increasing breeding efficiency.

Genetic maps using GBS markers were used to map QTLs for leaf color in spinach (Cai et al., Citation2018). Another study constructed linkage maps and identified QTLs for growth under low nitrogen conditions to improve nitrogen use efficiency (NUE) (Chan-Navarrete et al., Citation2016). RNA-seq approach was used to characterize genes expressed in spinach root and leaf in response to nitrogen stress and identified molecular mechanisms and pathways involved in nitrogen use efficiency (Joshi et al., Citation2020). Around 400 USDA accessions and commercial cultivars were genotyped using GBS (Shi et al., Citation2017). Genome-wide association studies performed on this set of accessions identified SNP region associated with leaf surface texture, petiole color, and edge shape (Ma et al., Citation2016), earliness of bolting, plant height, and erect leaf traits (Chitwood et al., Citation2016), low oxalate content (Shi et al., Citation2016b), mineral composition (Qin et al., Citation2017), and vitamin C content (Kunz et al., Citation2020). Additional analyses of this germplasm panel revealed genomic locations of resistance loci against leafminer (Liriomyza spp.) (Shi et al., Citation2016a), Verticillium wilt caused by race 2 (isolate So 923) of V. dahliae Kleb. (Shi et al., Citation2016d), leaf spot disease caused by Stemphylium botryosum (Shi et al., Citation2016c) and S. vesicarium (Bhattarai et al., Citation2020b), anthracnose disease caused by Colletotrichum dematium (Awika et al., Citation2020), white rust caused by Albugo occidentalis (Awika et al., Citation2019), Fusarium wilt caused by Fusarium oxysporum f. sp. spinaciae (Fos) (Gyawali et al., Citation2019), and the diseases caused by Pythium species (ongoing project).

C. Marker-assisted selection and genomic selection

The major limitations to FM spinach production are diseases, particularly DM, Fusarium wilt, white rust, and leaf spot. Therefore, the major goal of spinach breeding is to develop cultivars with improved disease resistance, as well as with tolerance to abiotic stresses, slow bolting, improved yield, and quality, increased levels of beneficial nutrients, and decreased levels of nitrate, oxalate, and cadmium (Morelock and Correll, Citation2007; Andersen and Torp, Citation2011; Correll et al., Citation2011). Despite beneficial nutrients, spinach contains a relatively large amount of oxalic acid (Mou, Citation2008b) that may convert to calcium oxalate in the human body leading to kidney stones (Noonan and Savage, Citation1999). Further, spinach may take up a high level of cadmium from the soil, which can lead to a variety of adverse effects on human health. However, efforts are underway to identify germplasm with a low concentration of cadmium in the plant and determine the genetic basis of cadmium uptake and accumulation (Greenhut, Citation2018).

The molecular and functional genomics research in spinach has progressed in recent years. Studies have focused on investigations of the cultivated and wild genetic resources using molecular markers to determine genetic diversity, population structure, and developing fingerprinting assays (Shi et al., Citation2017; van Treuren et al., Citation2020; Ribera et al., Citation2020a; Citation2020b; Bhattarai et al., Citation2021). Regarding plant disease resistance, genetic research has focused primarily on DM, the disease that causes the largest economic losses in spinach production. Markers linked to DM resistance genes suitable for MAS have been reported (Irish et al., Citation2008; Feng et al., Citation2018a; Bhattarai et al., Citation2020a), and assays developed using these markers would be preferred by breeding programs compared with the currently used labor-intensive disease screening (Feng et al., Citation2014; Bhattarai et al., Citation2020c). Current commercial spinach cultivars are mostly hybrids produced from true monoecious lines (Yamamoto et al., Citation2014; Janick, Citation2015). Several studies have examined the genetic control of sex in spinach and identified markers to determine it, which may help in the selection process. Molecular markers associated with several biotic and abiotic stresses, plant morphology, nutritional and health contents have been reported; however, they have not been tested and validated and practical marker-assays are yet to be developed. Genomic selection has not been utilized in spinach breeding though several phenotypes evaluated in the USDA germplasm panels are being explored for GS assessment. The recent upsurge of whole-genome sequence data for spinach germplasm has resulted in millions of SNP markers (Shi et al., Citation2019) that can be used for GWAS and GS studies and to develop effective molecular breeding strategies.

D. Future outlook

The demand for spinach is growing in the U.S. and worldwide owing to consumers’ increased interest in a healthier diet. To meet the growing demand and sustainable production, the breeding of spinach has focused on improving spinach yield and quality, disease resistance, abiotic stress tolerance, slow bolting, increasing the content of beneficial nutrients while minimizing the content of undesirable compounds. The Spinacia germplasm is diverse and offers excellent resources for identifying new traits, particularly for resistance to diseases and insects (van Treuren et al., Citation2020; Ribera et al., Citation2020b). The rapid emergence of new DM races remains the most critical challenge, needing an improved understanding of the mechanism regulating susceptibility and resistance. Long-term goals are to develop strategies to optimize the use of host resistance by exploiting all qualitative, quantitative, and susceptibility factors to manage DM and to achieve sustainable and profitable spinach production. Current spinach DM research efforts are focused on detailed characterization of host-pathogen interactions, identifying and mapping qualitative and quantitative resistances, functional tests of RPF loci by knocking down the RPF genes for potential susceptibility and/or transforming the susceptible lines with the resistant alleles, and characterization of effector genes. Future studies are expected to integrate phenotypic data with molecular and genomic data to identify genes and pathways involved in disease resistance and provide more detailed insights into resistance mechanisms. Identifying functional gene-based markers that are now possible with new genomic data should allow efficient stacking of multiple DM-resistant RPF genes in a single cultivar.

Further exploration of phenotypic and genetic variation among the diverse genetic resources (germplasm accessions, landraces, and wild accessions) will allow identifying novel traits, alleles, and genes. A combination of GWAS, linkage mapping, comparative genomics, and pan genomics studies will help to detect QTL regions and candidate genes involved in regulating traits. However, more efficient genetic transformation and gene editing systems in spinach are still needed to verify detected loci, characterize gene functions, and remove deleterious and introduce beneficial alleles. Targeted approaches are set to understand the mechanism of trait controls. Simultaneously, MAS and GS methods being optimized show promises in breeding superior spinach cultivars with desirable traits in near future.

VII. Cucumber

Cucumber, Cucumis sativus L. (2n = 2x = 14, family Cucurbitaceae), is among the most cultivated and consumed vegetable crops in the world. In 2018, cucumber was harvested from ∼1.98 million hectares of land with a total production of 75.22 million metric tons. The top five producers are China, Iran, Russia, Turkey, and the U.S. (FAOSTAT, Citation2020). There are ∼50 species in the genus Cucumis, including melon (C. melo L., 2n = 2x = 24) and the sister species of cucumber C. hystrix (2n = 2x = 24), which were diverged from the cucumber lineage ∼10 and 5 MYA, respectively (Sebastian et al., Citation2010; Yang et al., Citation2014). Cucumis hystrix is the only known species in the genus Cucumis that is sexually compatible with cucumber (Chen et al., Citation1997; Han et al., Citation2016b) and could be considered a secondary gene pool for cucumber breeding.

The primary gene pool of cucumber consists of four cross-compatible botanical varieties, the cultivated cucumber (C. sativus var. sativus), the wild cucumber (C. sativus var. hardwickii), the semi-wild Xishuangbanna (XIS) cucumber (C. sativus var. xishuangbannanesis), and the Sikkim cucumber (C. sativus L. var. sikkimensis). The wild cucumber is widely distributed in south and southeastern Asian countries with significant differentiation from other three taxa, and is the progenitor from which modern cucumbers were domesticated (Yang et al., Citation2012). The semi-wild XIS cucumber from Southwest China and surrounding regions exhibit some unique traits, such as very large fruit, orange flesh color due to accumulation of high levels of β-carotene, late flowering, and strong seed dormancy in some accessions, which are likely the results of diversifying selection after cucumber domestication (Qi et al., Citation1983; Bo et al., Citation2012, Citation2015; Pan et al., Citation2017). The Sikkim cucumber is distributed mainly in the Sikkim region of India and Nepal, which is featured with black spine, brown fruit with fine and heavy netting, and large hollow in mature fruit. The Sikkim cucumber could be considered as an ecotype of the cultivated cucumber that was under selection for local adaptation (Wang et al., Citation2021).

India is the center of cucumber diversity where it has been cultivated for at least 3,000 years (Candolle, Citation1959; Sebastian et al., Citation2010). Cucumber spread eastward to China ∼2,000 YAGO, and westward to Europe ∼1,500—700 YAGO (Keng, Citation1973; Paris et al., Citation2012; reviewed in Weng, Citation2021). Since its dispersal from India, natural and human selections have reshaped the cucumber, resulting in many ecotypes or landraces with adaptation to local climates, production systems, specific processing requirements, and consumer preferences. Cucumbers in different geographic regions are morphologically diverse in fruit size, skin color and texture, fruit firmness, crispness, and taste (Wehner, Citation1989), all resulted from selections to accommodate fresh consumption or processing (pickles). Modern cucumber breeding has intensified the divisions between the two types, resulting in several market classes adapting to large-scale commercial production in diverse environments for various purposes (Weng, Citation2021). Commercial cucumber breeding has also resulted in genetic erosion, and each market type seems to have a very narrow genetic base (Dijkhuizen et al., Citation1996). Hundreds of cucumber accessions are preserved in several major gene banks across the world (Weng and Sun, Citation2011). Molecular marker analysis of these accessions reveals that only a small portion of genetic diversity in cucumber from the diversity center is present in land races or commercial cultivars from other regions (Lv et al., Citation2012; Qi et al., Citation2013; Wang et al., Citation2018a); this observation suggests that gene bank collections remain a valuable source of genetic variation for future cucumber breeding.

A. Genetic and genomic resources

As a minor specialty crop, previously limited genetic and genomic resources were available as compared with many major horticultural and field crops. Recent advances in technology and instrumentation for genome sequencing, however, have provided exciting opportunities to expedite cucumber genetic and breeding research. Among major horticultural crops, cucumber was the first with a publicly available draft genome sequence, which was for genotype “9930,” a Chinese Long type (Huang et al., Citation2009; Li et al., Citation2011). Subsequently, three more cucumber genomes were sequenced, including the North European pickling type variety B10 (v1.0), the U.S. pickling inbred line Gy14 (v1.0), and the wild accession PI 193967 (Wóycicki et al., Citation2011; Yang et al., Citation2012; Qi et al., 2013). The initial assemblies, using Illumina or 454 sequencing technologies, varied with some issues, such as low continuity, limited coverage (∼55% of the 367 Mb genome), and poor genome annotation. New versions of the 9903 (v3.0) and B10 (v3.0) genomes, however, have recently been released (Li et al., Citation2019b; Osipowski et al., Citation2020). The 9930 v3.0 draft genome has 211 Mb in seven pseudomolecules, with 24,317 annotated genes, estimated 37.7% repetitive sequences, and an N50 contig/scaffold size of 8.9/31.1 Mb (Li et al., Citation2019b). This has been a significant improvement as compared with the 9930 v2.0 genome assembly. The genome coverage of 9930 v3.0 assembly, however, is only ∼60% complete, suggesting the presence of complex repetitive DNA sequences that are intractable despite the use of state-of-the-art sequencing technologies. Nonetheless, all the current assemblies do likely contain the majority of the coding regions of the cucumber genome, facilitating the development of molecular markers and linkage maps to help with genetic studies and molecular breeding. During the past decade, several high-density genetic maps of cucumber have been developed (Ren et al., Citation2009; Cavagnaro et al., Citation2010; Yang et al., Citation2013; Rubinstein et al., Citation2015; Zhou et al., Citation2015), and more than 300 cucumber lines re-sequenced (Qi et al., 2013; Bo et al., Citation2016; Liu et al., Citation2019a). Further, over 1,200 cucumber accessions in the USDA national germplasm collection have been genotyped using GBS technology, resulting in the public availability of hundreds of thousands of molecular markers (Wang et al., Citation2018a).

B. Mapped genes and QTLs, and marker-assisted selection

The availability of draft genome sequences coupled with cost-effective high-throughput genome sequencing and genotyping technologies have greatly expedited molecular mapping and QTL analysis in cucumber. Further, the relatively small genome size with no recent whole-genome duplication (Huang et al., 2009) and the annual growth habit with a relatively short life cycle (2-3 months from seed to seed) offer great advantages for genetic studies in cucumber (Weng, Citation2016). Indeed, we have witnessed the exponential growth of cucumber publications in this field. For example, the most recent Cucumber Gene Catalog (Weng and Wehner, Citation2017) documented 199 genes or major-effect QTLs, of which 70 were added since 2009 when the first draft genome of cucumber was released. From among cucumber genes described in the 2010 gene catalog, only a few had known chromosomal locations and only one had a known candidate gene, the femaleness (F) locus coding the 1-aminocyclopropane-1-carboxylate synthase in the ethylene biosynthesis pathway. Recently, Wang et al. (Citation2019a) conducted an extensive literature review of the mutants, genes, and QTLs that were genetically mapped or characterized in cucumber. They documented 81 simply inherited major genes and QTLs that were cloned or fine mapped. For each gene, detailed information was presented including chromosomal locations, allelic variants and associated polymorphisms, predicted functions, and linked markers that could be used for MAS in cucumber breeding (). They also documented 322 QTLs for 42 quantitative traits, including 109 for resistances against seven pathogens. Further, through collaborative efforts among public cucumber researchers and commercial breeders, the authors identified 130 quantitative traits and developed a set of recommendations for QTL nomenclature for these traits in cucumber (Wang et al., Citation2019c).

Table 6. Major genes and QTLs for disease resistance traits in cucumber (Cucumis sativus).

Pan et al. (Citation2020a) reviewed QTLs identified previously for fruit size, shape, and weight in cucumber, melon, and watermelon, and from which they identified 150 consensus QTLs. Additionally, the authors identified 253 homologs of eight classes of fruit or grain size/weight-related genes cloned in other plant species, which revealed the widespread structure and function conservation of fruit size/shape gene homologs in cucurbits. The rapid progress in mapping and cloning of genes/QTLs in cucumber is reflected in a large number of publications in this field. The aforementioned two recent reviews (Wang et al., Citation2019c; Pan et al., Citation2020a) included the literature until June 2019. In 2019 alone, more than 20 studies were published that reported identification of new candidate genes for simply inherited mutants or QTLs in cucumber. These publications covered a variety of traits, such as resistance to cucumber vein yellowing virus (CVYV), powdery mildew and angular leaf spot (Pujol et al., Citation2019; Zhang et al., Citation2020a; Citation2020b; Citation2021), tolerance to abiotic stresses (heat and cold) (Dong et al., Citation2019; Citation2020; Wang et al., Citation2019b), plant architecture (Wen et al., Citation2019a; Njogu et al., Citation2020), flowering time, fruit size, shape, skin features, pedicel direction, and internal quality attributes (Rett-Cadman et al., Citation2019; Sun et al., Citation2019; Zhang et al., Citation2019, Citation2020b; Gao et al., Citation2020; Sheng et al., Citation2020; Song et al., Citation2020; Pan et al., Citation2020b; Wang et al., Citation2020a; Citation2020b; Wang et al., Citation2021), and leaf color and shape mutations (Ding et al., Citation2019; Liu et al., Citation2019; Hu et al., Citation2020).

A large number of molecular markers developed for horticulturally-important traits facilitate MAS breeding in cucumber (Wang et al., Citation2019c; Feng et al., Citation2020; Hao et al., Citation2020). MAS and QTL pyramiding is of particular importance in breeding for multiple disease resistance in cucumber, traits that are often controlled by multiple, recessively-inherited QTLs (Wang et al., Citation2019c). This is especially true for international vegetable seed companies, which routinely conduct MAS in cucumber breeding.

C. Future outlook

The availability of genomic and genetic resources in cucumber is revolutionizing cucumber genetic studies and breeding activities, though many challenges remain. The newest genome assembly (9930 v3.0) covers only ∼60% of the cucumber genome. A genome assembly with better coverage and annotation is needed. A large amount of genomic data is being continuously accumulated; however, it is challenging to analyze these data and establish sequence-trait associations. Numerous genes and QTLs for horticulturally important traits have been reported, yet many of them have not been validated in different genetic backgrounds or environments. It is important to continue exploring cucumber germplasm resources for novel resistance genes or alleles to diversify the gene pool and avoid resistance breakdown. Different sources of resistances for the same pathogens, or multiple resistances genes for various pathogens, could be pyramided in new cultivars via MAS. A very limited number of genes or QTLs have been cloned in cucumber, and their functions are largely unknown. Further, a reliable and efficient genetic transformation system is lacking for cucumber, which hinders studies of gene function or editing genes for cucumber improvement.

VIII. Chicory

Chicory (Cichorium intybus L., 2n = 2x = 18) is one of the most popular horticultural crops in the world. Although there are large differences in cultivation techniques and cultural uses, chicory is widely consumed as a leafy vegetable and it is found in almost every corner of the world and included in the diet of most Western and Eastern cultures. It belongs to the Asteraceae, a very large botanical family with ∼23,000 species, subdivided into 1,535 genera and grouped into three subfamilies: Asteroideae, Barnadesioideae, and Cichorioideae (Bremer and Anderberg, Citation1994). In the subfamily Cichorioideae, the tribe Lactuceae includes the genus Cichorium, with different horticultural species recognized according to their origin and uses (Barcaccia et al., Citation2016).

Integrating data from the analysis of morphological descriptors and molecular markers with diffusion, production, and commercial indicators, C. intybus and C. endivia appeared as the two most popular cultivated species (Kiers, Citation2000; Lucchin et al., Citation2008; Barcaccia et al., Citation2016). Considering their taxonomy, within these two distinct species of the genus Cichorium, the subspecies intybus L. and glabratum (C. Presl) Arcang. were recognized for C. intybus, whereas the subspecies endivia Hegi and pumilum (Jacq) Cout. were established for C. endivia L. (Lucchin et al., Citation2008; Barcaccia et al., Citation2016). The botanical varieties and cultivar groups of C. intybus subsp. intybus are several and classified as follows: var. foliosum (Witloof chicory), var. porphyreum (Pain de Sucre), var. latifolium (Radicchio), var. sylvestre (Catalogne), and var. sativum (Root chicory) (Barcaccia et al., Citation2016 and references therein). Within C. intybus, cultivated chicory types are biennial whereas wild chicory types are perennial plants.

Most likely known by the Egyptians as a medicinal plant and used as a vegetable crop by ancient Greeks and the Romans, chicory gradually underwent a process of naturalization in Europe (Lucchin et al., Citation2008 and references therein). Currently, wild C. intybus covers a great portion of the entire European continent, where leafy products from chicory landraces have traditionally become a part of the diet of local populations as an important ingredient of typical local dishes (Lucchin et al., Citation2008; Barcaccia et al., Citation2016). Further, cultivated varieties of the different C. intybus cultivars and biotypes are mainly grown throughout continental Europe, in southwestern Asia, and limited areas of North America, South Africa, and Australia (Barcaccia et al., Citation2016). In horticultural markets, leaf chicory traditionally includes all the cultivar groups whose commercial products are the leaves and used in the short food supply chain (for preparation of both cooked and fresh salads), whereas all the other types, whose commercial products derived from the roots, are destined to either industrial transformation (inulin extracts) or human consumption (coffee substitutes), and are classified as root chicory (Barcaccia et al., Citation2016 and references therein).

Chicory is commonly an allogamous species due to an efficient sporophytic self-incompatibility (SSI) system and consistent entomophilous pollination that favors outcrossing (Barcaccia et al., Citation2003a; Citation2003b; Lucchin et al., Citation2008). Furthermore, hybridization among plants is also promoted by floral morphological barriers that hinder selfing, and physiological mechanisms that boost germination and growth of pollen grains and tubes in case of outcrossing (Lucchin et al., Citation2008; Barcaccia et al., Citation2016). Commercial seeds of OP varieties, synthetic varieties, and F1 hybrids are available on the global chicory market, which is adopted for large-scale farming systems; however, a great proportion of chicory is planted in many small farming units, using seed of local varieties selected and maintained through mass selection by individual farmers (Barcaccia et al., Citation2003b; Patella et al., Citation2019b).

A. Genetic resources

Extensive lists of Cichorium species, subspecies, botanical varieties, and cultivar groups have been published, some of which are accessible via the internet. The most complete repository is through the USDA Germplasm Resources Information Network (GRIN) database (https://www.ars-grin.gov/), in which several hundred accessions are listed. A smaller list is provided in Mansfeld’s World Database of Agricultural and Horticultural Crops developed at IPK in Germany (https://mansfeld.ipk-gatersleben.de/), which includes <100 accessions.

In Italy, where chicory is widely cultivated, especially in the north-eastern regions, for a long time, plant materials grown by farmers were represented by local varieties known to possess variation and adaptation to the natural and anthropological environment where they were originated and still are widely cultivated (Lucchin et al., Citation2008; Barcaccia et al., Citation2016 and references therein). Such local varieties were conserved and multiplied by farmers as OP populations via phenotypic selection, and thus they were highly heterozygous and heterogeneous. Although a considerable range of phenotypic variation within each population was present across all cultivated types, clear genetic differentiation was also noticeable among populations for various traits and molecular markers (Lucchin et al., Citation2008; Barcaccia et al., Citation2016 and references therein).

During the past two decades, the agricultural scenery in the Mediterranean countries has profoundly changed for chicory cultivations, where subsistence mixed farming units have been transformed into extensive farming systems growing mainly modern improved varieties instead of local varieties (i.e., farmer populations). In recent years, professional breeders have developed protocols based on controlled hybridizations among chosen individual plants to obtain genetically improved synthetic varieties showing higher distinctiveness, uniformity, and stability for both agronomic and esthetical characteristics (Barcaccia et al., Citation2003b; Patella et al., Citation2019b). The modern breeding programs aim to isolate individuals within the best local populations for the selection of inbred lines suitable for the production of commercial F1 hybrids (Barcaccia et al., Citation2003b; Patella et al., Citation2019b). These programs are increasingly assisted by the use of molecular markers (Ghedina et al., Citation2015; Galla et al., Citation2016; Patella et al., Citation2019b).

B. Mapped genes and QTLs

Several saturated genetic linkage maps spanning the entire genome (∼2.6 Gb) are available for leaf and root chicory (Cadalen et al., Citation2010; Gonthier et al., Citation2013; Muys et al., Citation2014; Palumbo et al., Citation2019); these maps are highly useful for molecular genetics and breeding studies of chicory. The first genetic linkage map of chicory is of particular interest; it included 431 SSR and 41 STS markers, placed onto its nine LGs covering 878 cM of the genome (Cadalen et al., Citation2010). This consensus map was constructed by the integration and organization of molecular marker data derived from one Witloof and segregating populations of two root chicory. More recently, the first high-density genetic linkage map of chicory was constructed using GBS technology and leaf chicory variety Radicchio (Palumbo et al., Citation2019); this map contained 727 SNP markers assembled into nine LGs covering a total length of 1,413 cM of the genome. The current genetic linkage maps of chicory (Cadalen et al., Citation2010; Gonthier et al., Citation2013; Muys et al., Citation2014; Palumbo et al., Citation2019) represent a starting point for mapping single genes and QTLs (). For example, the maps have been successfully used for fine mapping of self-incompatibility and MS genes (Gonthier et al., Citation2013; Palumbo et al., Citation2019), providing a basis for understanding the genetic control of reproductive barriers in chicory and their applications for the production of F1 hybrids (Barcaccia et al., Citation2016). In root chicory, molecular markers linked to the nuclear male sterility (NMS1) locus (LG 5) and SSI locus (LG 2) were identified (Cadalen et al., Citation2010; Gonthier et al., Citation2013). Similarly, in leaf chicory, two SSR markers were closely linked to another NMS locus (ms1) in the LG 4 (Cadalen et al., Citation2010; Barcaccia and Tiozzo, Citation2012). More recently, several SNP markers were discovered that fully co-segregated with this MS locus (Barcaccia and Tiozzo, Citation2014; Palumbo et al., Citation2019). A subsequent mesosynteny analysis revealed that as many as 10 genomic DNA sequences encompassing the selected SNP variants of chicory mapped in a peripheral region of chromosome 5 of lettuce (L. sativa) spanning about 18 Mb. Overall, these molecular marker data could be used for genotyping plant material and for MAS in leaf chicory breeding.

Table 7. Major genes and QTLs positioned on the molecular linkage map of chicory (Cichorium intybus).

The genetic linkage maps of chicory (Cadalen et al., Citation2010; Palumbo et al., Citation2019) have been essential for the development of robust genotyping methods using SSR and SNP markers (Ghedina et al., Citation2015; Galla et al., Citation2016; Palumbo et al., Citation2019). These methods, including various types of molecular markers, can be effectively used to assess genetic distinctness and population structure of various types of commercial varieties of leaf chicory, such as synthetics, F1 hybrids, and F2 populations (Patella et al., Citation2019b). It is expected that marker genotyping will also find practical utility for evaluating the genetic distinctiveness, uniformity, and stability of seed lots belonging to commercial varieties.

C. Marker-assisted selection and genomic selection

Historically, most cultivated varieties of chicory have been developed using mass selection to obtain uniform populations characterized by high yield and suitable commercial standards (Barcaccia et al., Citation2016 and references therein). Currently, two genetically distinct types of chicory cultivars are on the market: OP or synthetics, and F1 hybrids (Barcaccia et al., Citation2003b; Patella et al., Citation2019b). Newly released cultivars are mostly synthetics, developed through inter-crossing or poly-crossing among many selected parental individuals or clonal lines, followed by progeny testing to assess general combining ability (Barcaccia et al., Citation2003b; Barcaccia et al., Citation2016). By their nature, synthetics have a wide genetic base represented by a mixture of highly heterozygous and heterogeneous individuals, yet showing rather similar phenotypes. In recent years, however, developing F1 hybrid cultivars has become more common, mainly done in the private sector (Barcaccia et al., Citation2016; Patella et al., Citation2019b). Experimental data on how these hybrids are developed are currently scarce, and presumably, each company employs its own protocol depending on genetic material used and the system(s) of pollination control during inbred line development and F1 hybrid seed production (Barcaccia et al., Citation2016). In general, the strong SI system in chicory has been a great barrier for the development of parental inbred lines or clones used to produce single-cross hybrids (Lucchin et al., Citation2008; Barcaccia et al., Citation2016; Patella et al., Citation2019b). However, there has been an increased interest in the production of F1 hybrids due to the discovery of MS genes (Gonthier et al., Citation2013; Palumbo et al., Citation2019 and references therein). For instance, an increasing number of cultivars of the Witloof and Radicchio types are commercialized as true F1 hybrids. Further, owing to the economic benefits, most newly released varieties of leaf chicory are F1 hybrids, mainly developed by European seed companies. Moreover, most commercial breeding programs have improved their efficiency during the past several years due to the use of genomic tools. Various types of genetic markers, including SSRs, ESTs, and SNPs, have been implemented for genotyping elite breeding stocks of leaf chicory (Ghedina et al., Citation2015; Palumbo et al., Citation2019; Patella et al., Citation2019b). The available data show that markers have been reliable for assessing multi-locus genotypes of individual plants, breeding stocks, and lineages, including assessing the degree of homozygosity of inbred lines and their genetic stability. Moreover, markers have also been used to accurately estimate the specific combining ability between parental lines, as judged based on their genetic diversity and predicted degree of heterozygosity in their F1 hybrid progeny. Such information could be utilized for planning 2-way crosses and predict heterosis of the experimental F1 hybrids based on genetic distance and allelic divergence between parental inbred lines. Information on the parental genotypes would also allow protection of newly registered cultivars' assessment of genetic purity and identity of the seed stocks of commercial F1 hybrids.

D. Future outlook

Local farmer varieties of chicory represent invaluable genetic resources, which should be collected and preserved in gene banks for characterization and future exploitation by breeding programs. Most of this germplasm is OP farmer-derived populations as well as local synthetics that typically exhibit a great deal of genetic diversity in morphological and physiological characteristics, highly desirable to breeding programs. However, it appears that variation in traits related to biotic and abiotic stresses is scarce in the local populations. The reason could be that farmer selection was traditionally focused on morphological and esthetical characteristics important to the market, instead of selecting for disease resistance, abiotic stress tolerance, or post-harvest quality traits. It is, therefore, important to identify and characterize genetic resources for the latter traits and use new molecular marker technologies to identify the underlying genes or QTLs to be utilized in breeding programs. Such technologies have been extensively and successfully used in many other horticultural crops, and it is expected that they would be of great help in advancing basic genetic knowledge and applied breeding progress in chicory.

In chicory, next-generation breeding programs currently include several selection steps based on MAS and MABC applications (Ghedina et al., Citation2015; Barcaccia et al., Citation2016; Palumbo et al., Citation2019; Patella et al., Citation2019b). Molecular markers are now routinely adopted in this species to predict and select single plant reproductive barriers (e.g., SI and MS), to develop parental inbred lines or clones, and to assess their specific combining ability to better exploit potential heterosis in F1 hybrids (Ghedina et al., Citation2015; Patella et al., Citation2019b). This information also enables chicory breeders to determine the genetic distinctness, uniformity, and stability of commercial varieties (i.e., DUS testing).

IX. Prospects, perspectives, and direction of future research on vegetable crops

With the availability of reference genomes and new genomic tools, including GBS, BSA-seq, GWAS, and GS, vegetable crop breeding in the twenty-first century relies heavily on the use of molecular markers and NGS data. The recent application of modern technologies has already led to substantial progress in our understanding of the genetics of many vegetable crops and significantly improved the efficiency and accuracy of breeding programs. For example, advanced DNA technologies have expedited the identification and use of molecular markers and candidate genes associated with important horticultural characteristics, leading to more effective and efficient breeding strategies. Although some vegetable crops, including eggplant, spinach, and chicory, lag in the availability and use of new technologies for breeding purposes, for many crops MAS is already standard practice for numerous traits, in particular disease resistance and fruit quality characteristics, in both public and private breeding programs. It is expected that MAS will be more common for additional traits and other vegetable crops shortly. The ease of marker development and application, as well as the decreasing cost of marker genotyping, make MAS a primary breeding choice for many traits and in most breeding programs.

Unlike for simple qualitative traits for which the application of markers has been rather straightforward, for most complex quantitative traits, the use of MAS is currently hampered by several impediments, including inaccurate QTL identification due to phenotyping difficulties, the large size of QTL intervals, lack of QTL validation, scarcity of reliable, closely-linked markers, QTLs originating from distantly-related wild species, population specificity of QTLs and linkage drag. For example, for many quantitative traits, and in most vegetable crops, reported QTLs often encompass large genomic intervals including genes with undesirable phenotypic effects, thus prohibiting their effective use in breeding programs. These issues, of course, are not limited to vegetable crops as they are present in most other crops as well, and have significantly interdicted the use of MAS for many important agricultural traits. Before effective use of QTLs in widespread marker-assisted breeding, it is pivotal that efforts be made to fine-map and delineate QTLs to small intervals, validate QTLs and their associated markers across breeding populations, and identify reliable and reproducible closely-linked markers. It is expected that once QTLs are more characterized and verified, the use of MAS will rapidly and progressively extend beyond just simple Mendelian qualitative traits.

In addition to markers and MAS, newer techniques, such as pan-genome analysis (Khan et al., Citation2020), SNP-chip genotyping, and CRISPR/Cas system are inspiring many vegetable breeders to think and breed more holistically, yet more precisely. Traditionally, often breeders had to focus on one or a few traits at a time and breed for such traits for a long time. The conventional breeding protocols required development and phenotypic evaluation of hundreds of inbred lines and trialing of thousands of experimental F1 hybrids before identifying desirable hybrids with commercial value. With the use of new technologies, breeders may now design and develop inbred lines with numerous combinations of desirable characteristics based on predesigned molecular blueprints of the lines, and subsequently, develop F1 hybrids with the most complementary combinations of traits coming from genetically and phenotypically selected parental lines. The advanced technologies may allow the development of elite inbred lines and commercial F1 hybrids with complex trait packages, for example with gene/QTL combinations that would fit grower and consumer demands for specific growing conditions, in a much shorter time and with substantially reduced cost. Further, the ever-increasing and more inclusive SNP chips will allow genotyping breeding materials for numerous traits at very early developmental stages, which would significantly reduce the time needed to develop/identify elite inbred lines and complementary hybrid combinations.

In addition to the expected advancement in marker technology across vegetable crops, more reliable and efficient regeneration and genetic transformation systems with predictable and reproducible results must be available for most vegetable crops, which would allow more common use of gene editing techniques to not only better understand gene function but also facilitate precise genetic modification toward crop improvement. Advances in genome-editing tools will help drive vegetable crop breeding efforts and further reveal links between the nature of gene action and phenotypic performances.

Author contributions

Tomato section was written by MJ and MRF, pepper section was written by JV and B-CK, eggplant section was written by SL, lettuce section was written by IS, spinach section was written by GeB, cucumber section was written by YW, chicory section was written by GiB, all other sections were written by IS and MRF. MRF and IS comprehensively edited all sections. All authors read, edited, and approved the final manuscript.

Abbreviations
AFLP=

amplified fragment length polymorphism

BC=

backcross

BIL=

backcross inbred line

BSA-seq=

bulked segregant analysis sequencing

CAPS=

cleaved amplified polymorphic sequence

CGA=

chlorogenic acid

CGMS=

cytoplasmic-genic male sterility

ChiVMV=

chili veinal mottle virus

cM=

centi Morgan

CMS=

cytoplasmic male sterility

COS=

conserved ortholog set

CRISPR/Cas=

clustered regularly interspaced short palindromic repeat/CRISPR associated protein

CVYV=

cucumber vein yellowing virus

DArTseq=

diversity arrays technology sequencing

DM=

downy mildew

DUS testing=

distinctness, uniformity and stability testing

EMS=

ethyl methanesulfonate

EST=

expressed sequence tag

F locus=

femaleness locus

FM=

fresh market

GBS=

genotyping by sequencing

GEBV=

genomic estimated breeding values

GMO=

genetically modified organism

GMS=

genic male sterility

GO=

gene ontology

GP1=

primary gene pool

GP2=

secondary gene pool

GRIN=

Germplasm Resources Information Network

GS=

genomic selection

GWAS=

genome-wide association mapping

Hi-C=

unbiased genome-wide chromatin conformation capture protocol using proximity ligation

HIGS=

host-induced silencing

HRM=

high resolution melting

IL=

introgression line

InDel=

insertion/deletion

LG=

linkage group

LMV=

lettuce mosaic virus

LNSV=

lettuce necrotic stunt virus

MABC=

marker-assisted backcrossing

MAGIC=

multi-parental advanced generation inter-cross

MAP=

modified atmosphere packaging

MARS=

marker-assisted recurrent selection

MAS=

marker-assisted selection

MLBVV=

Mirafiori lettuce big-vein virus

mQTL=

metabolomic quantitative trait locus

MS=

male sterility

MutMap=

approach to identify causative mutations responsible for a phenotype

MYA=

million years ago

NAM=

nested association mapping

NBS-LRR=

nucleotide-binding site, leucine-rich repeat

NGS=

next generation sequencing

NIL=

near-isogenic line

NMS=

nuclear male sterility

NUE=

nitrogen use efficiency

OP=

open pollinated

PCR=

polymerase chain reaction

PePMov=

pepper mottle virus

PROC=

processing

PRR=

pseudo-response regulator

PS=

phenotypic selection

PVY=

potato virus Y

QTL=

quantitative trait locus

RAD-sequencing=

restriction site associated DNA sequencing

RAPD=

random amplified polymorphic DNA

RenSeq=

resistance gene enrichment sequencing

RFLP=

restriction fragment length polymorphism

RGA=

resistance gene analog

RIL=

recombinant inbred line

RKN=

root-knot nematode

RNA-Seq=

RNA-sequencing

RS=

Ralstonia solanacearum

SCAR=

sequence characterized amplified region

siRNA=

small interfering RNA

SLAF-seq=

specific locus amplified fragment sequencing

SolCAP=

Solanaceae coordinated agricultural project

SSR=

simple sequence repeat

STS=

sequence-tagged site

SV=

structural variants

TBSV=

tomato bushy stunt virus

ToMV=

tomato mosaic virus

TRAP=

target region amplification polymorphism

TSWV=

tomato spotted wilt virus

TuMV=

turnip mosaic virus

TYLCV=

tomato yellow leaf curling virus

XIS=

Xishuangbanna

YAGO=

year(s) ago

Acknowledgments

The mentioning of trade names or commercial products in this publication is solely to provide specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture (USDA) or the authors of this publication.

Disclosure statement

The authors declare that there is no conflict of interest.

References