1,741
Views
1
CrossRef citations to date
0
Altmetric
Computational Life Sciences, Bioinformatics and System Biology

QPromoters: sequence based prediction of promoter strength in Saccharomyces cerevisiae

, , , ORCID Icon, , & show all
Article: 2168304 | Received 12 Nov 2021, Accepted 10 Dec 2022, Published online: 20 Jan 2023

Abstract

Promoters play a key role in influencing transcriptional regulation for fine-tuning the expression of genes. Heterologous promoter engineering has been a widely used concept to control the level of transcription in all model organisms. The strength of a promoter is mainly determined by its nucleotide composition. Many promoter libraries have been curated, but few have attempted to develop theoretical methods to predict the strength of promoters from their nucleotide sequence. Such theoretical methods are not only valuable in the design of promoters with specified strength but are also meaningful in understanding the mechanistic role of promoters in transcriptional regulation. In this study, we present a theoretical model to describe the relationship between promoter strength and nucleotide sequence in Saccharomyces cerevisiae. We infer from our analysis that the −49–10 sequence with respect to the Transcription Start Site represents the minimal region that can be used to predict promoter strength. https://qpromoters.com/ and a standalone tool https://github.com/DevangLiya/QPromoters to quickly quantify the strength of Saccharomyces cerevisiae promoters.

View correction statement:
Correction

Author summary

Regulating gene expression is a crucial aspect of metabolic engineering and synthetic biology. Promoter engineering plays a vital part in modulating transcriptional capacity and hence controlling gene expression. While there are tools to identify promoter regions in the eukaryotic genome, there are no simple tools to predict the strength of promoters in eukaryotes. Previous studies have shown that there exists a relationship between the promoter strength and the natural log of the promoter score. We use this relationship to identify the minimal promoter region in Saccharomyces cerevisiae that can be used to predict the strength of promoters. We have used a set of 18 standard promoters whose strengths were experimentally determined in previous studies to verify our model. We were able to classify promoters into three broad classes, namely weak, moderate, and strong, with high confidence. We have also developed a website and an open-source script that can be utilized to quantify promoter strength in Saccharomyces cerevisiae and streamline the process of promoter design.

Introduction

Saccharomyces cerevisiae (S. cerevisiae), commonly known as brewer's yeast, is a widely used eukaryotic model organism in synthetic biology – it has applications in the production of biofuels, recombinant proteins and bulk chemicals (Nevoigt, Citation2008; Tang et al. Citation2020). Promoters are basic transcriptional elements that play a key role in manipulating genetic and metabolic pathways in S. cerevisiae by the regulation of protein expression both quantitatively and temporally (Scalcinati et al. Citation2012; Latimer et al. Citation2014) and are one of the most crucial component of yeast synthetic biology toolbox (Hubmann et al. Citation2014; Redden et al. Citation2015; Machens et al. Citation2017; Portela et al. Citation2017; Ottoz and Rudolf Citation2018; Decoene et al. Citation2019; Kotopka and Smolke Citation2020; Liu et al. Citation2020; Feng and Marchisio Citation2021). Promoters in S. cerevisiae have multiple components which together account for successful transcriptional regulation. The key components of a yeast promoter are an upstream activator sequence (UAS), an upstream repressor sequence (URS), a nucleosome-disfavoring sequence and a core promoter region.

The core promoter is the DNA sequence nearest to the start codon, which interacts with RNA polymerase II (pol-II) and other general transcriptional factors to form the pre-initiation complex (PIC) (Tang et al. Citation2020). The core region also contains the TATA box, the transcription start site (TSS), a PIC localization stretch and a TSS scanning region for pol-II (Lubliner et al. Citation2013). The binding of general transcription factor proteins and histones to the TATA box facilitates the subsequent binding of pol-II, which along with several transcription factor proteins, constructs a transcription initiation complex that starts the mRNA synthesis from the TSS (Kanhere and Bansal Citation2005; Jiang and Pugh Citation2009). The nucleotide composition of different regions in the core promoter strongly influences the sensitivity of the promoter. Studies have shown that promoters with A/T – or T/C-rich PIC regions have higher sensitivities than promoters containing G/C-rich sequences (Lubliner et al. Citation2015). The position of the different regions of the core promoter is illustrated in Figure .

Figure 1. A schematic of promoter architecture in S. cerevisiae: The text in crimson (top) denotes the conditions necessary for high sensitivity, while the green text (bottom) denotes the conditions for lower sensitivity. The length of the different regions of the promoter are also given in bp. The jagged line at the bottom denotes the part of the promoter that is rigid in nature.

Figure 1. A schematic of promoter architecture in S. cerevisiae: The text in crimson (top) denotes the conditions necessary for high sensitivity, while the green text (bottom) denotes the conditions for lower sensitivity. The length of the different regions of the promoter are also given in bp. The jagged line at the bottom denotes the part of the promoter that is rigid in nature.

The UAS and URS are the regulatory components of a promoter and are located upstream to the core promoter region. UASs and URSs act as binding sites for transcription activators and repressors, respectively. The UAS enhances gene expression, provides additional stability, and plays a role in regulating the PIC formation process (West et al. Citation1984; Bitter et al. Citation1991). The UASs and URSs in S. cerevisiae are typically 10 bp long but can vary from 5 to 30 bp in length (Stewart et al. Citation2012).

The disfavoring nucleotide sequence is a stretch of DNA that decreases nucleosome occupancy to facilitate transcription (Struhl and Segal Citation2013). Poly(dA:dT) tract, a homopolymeric stretch of deoxyadenosine nucleotides, is a well-known nucleosome-disfavoring sequence commonly present in promoters (Workman Citation2006).

The structural properties of the promoter are vital for successful transcription. The flexibility of the promoter should be optimal to make sure that the binding sites are accessible and properly positioned to enable their recognition by transcriptional machinery (Jiang and Pugh Citation2009). In this regard, the bendability, or the propensity of each tri-nucleotide to bend, is essential (Kanhere and Bansal Citation2005). Existing studies indicate the presence of regions of low bendability about 100-200 bp upstream to the start codon (Miele et al. Citation2008) (illustrated in Figure by a jagged line). Studies also indicate that the low bendability is caused by a combination of A/T richness and di- and tri-nucleotide composition (Akan and Deloukas Citation2008).

Promoters in S. cerevisiae can be either constitutive, that are relatively unaffected by internal and external signals and maintain stable levels of transcription, or inducible, which can initiate a drastic change in transcriptional levels in response to specific stimuli. These stimuli, called inducers, range from molecules such as metabolites, amino acids, and sugars to metal ions and environmental factors like pH and stress (Weinhandl et al. Citation2014; Gasser et al. Citation2015; Kim et al. Citation2015; Fischer et al. Citation2016; Rajkumar et al. Citation2016). Using endogenous promoters of S. cerevisiae for synthetic biology applications has disadvantages owing to an insufficiency of well-characterized promoters (Chen et al. Citation2018; Zhou et al. Citation2018). Thus, it is of utmost importance to characterize and quantify the strength of various S. cerevisiae promoters and create a database of the same.

Previous work has established a two-step approach for the quantitative prediction of the strength of promoters in Escherichia coli (E. coli), a prokaryotic model organism (Li and Zhang Citation2014; Bharanikumar et al. Citation2018). The linear relationship between the total promoter score and the promoter strength is well-established in E. coli (Berg and von Hippel Citation1987; Bharanikumar et al. Citation2018). In this study, we have presented a similar simplified model of the promoter strength in S. cerevisiae based on the promoter sequence.

We have done an extensive literature survey, and we observe that there are several established methods to quantify the strength of promoters (Rhodius Virgil and Mutalik Vivek Citation2010; Yada et al. Citation2011; Bharanikumar et al. Citation2018; Hayat et al. Citation2020; Zhao et al. Citation2020; Li et al. Citation2022; Zhao et al. Citation2022). We would like to highlight the fact that most of these methods deal with E. coli as the model organism. Our study is the only study that deals with the promoter sequences associated with eukaryotes to the best of our knowledge.

Materials and methods

The core promoter sequences of 5117 promoters in S. cerevisiae were retrieved using the Sequence Retrieval in Tool Eukaryotic Promoter Database (EPD) (Dreos et al. Citation2017). The core promoter sequence consisted of −49–10 sequences with reference to the Transcription Start Site (TSS).

A Position Frequency Matrix (PFM) was generated from the motif of all 5117 promoter sequences. The PFM was then converted to Position Weight Matrix (PWM) or Position-Specific Scoring Matrix (PSSM) using the functions from biopython (Cock et al. Citation2009). The motif landscape was visualized using WebLogo (Crooks et al. Citation2004). The resulting PSSM was then used to calculate the ‘promoter score’ for all the promoters in the downstream analysis.

Taking inspiration from the well-established linear relationship between the total promoter score and the promoter strength in E. coli (Berg and von Hippel Citation1987), we modeled the promoter strength using a linear model with the promoter score as,

Promoter strength = C0 + C1 × (Promoter score)

Lee et al. have characterized the strength of 19 constitutive promoters in S. cerevisiae using three fluorescence markers (Venus, mRuby2, and mTurquoise2) (Lee et al. Citation2015). We have used 18 of these promoter strengths in this study. The sequence for the promoter pREV1 was not found in EPD, due to which we have dropped it from our analysis. The normalized fluorescence values folded over the background were obtained from the authors of Lee et al. (Citation2015). The log of these values gives the promoter strength. These were further divided by the strength of the strongest promoter (pTDH3) from Lee et al. (Citation2015). We note that this step does not alter the linear relationship that is being tested but merely acts as a scaling. These values finally constitute the result space or the ‘Promoter strength’ in Eq. 1. We also have used another set of five promoters from Decoene et al. (Citation2019), as the training set to further test the robustness of our model (Decoene et al. Citation2019). These fluorescence values were also subjected to the normalizations described above.

We define a ‘segment score’ which is simply the total score of a given segment of a given promoter as calculated from PSSM. This score is then divided by the highest score (corresponding to pTDH3) to obtain the feature space or ‘Promoter score’ part of the Eq. 1.

We then performed Ordinary Linear Regression (OLS) using the statsmodels package in python. C0 and C1 were left as free parameters to obtain the best fit. The quality of fit was then assessed using reduced r-squared and F-statistic. The significance of model parameters were assessed using the t-statistic. We also fitted a linear model to the residues to look for biases in the model. Scipy, statsmodels and Seaborn packages were used to perform, visualize, and test the linear regression (Seabold and Perktold Citation2010).

Results and discussion

The 5117 native S. cerevisiae promoters from the EPD that were included in this study represent a diverse population of transcriptional regulators. Motif analysis on this set, shown in Figure , revealed that the promoters were diverse in terms of nucleotide composition. We see that the conservation along the entire promoter length is low, except for the TSS. Based on this, there are two possible regions or segments of the −49–10 sequence that can be considered for modeling the linear relationship in Eq. 1. 1) Highly conserved −9–1 segment, (2) −49 to X segment where X varies from −48 to 10. Results of modelling the promoter strength using these segments are discussed in the following sections along with their possible biological implications.

Figure 2. Motif of −49–10 region: Motif logo generated from all 5117 S. cerevisiae promoters from Eukaryotic Promoter Database. The motif is generated for −49–10 sequence with respect to the Transcription Start Site.

Figure 2. Motif of −49–10 region: Motif logo generated from all 5117 S. cerevisiae promoters from Eukaryotic Promoter Database. The motif is generated for −49–10 sequence with respect to the Transcription Start Site.

−49 to X region

We sought to determine the shortest sequence that could model the relationship between experimental promoter strength and the segment scores. We first fix one end of such a segment at −49 position with respect to TSS and add nucleotides towards TSS until position ‘X’. Scores of the segments thus obtained were used to perform a linear regression described by the Eq. 1. The quality of fit estimators for each such regression is shown in Figure . It was observed that the quality of fit generally improves as more and more nucleotides are added. This is indicated by larger R-squared values and lower p-values for F-statistic. A saturation is reached at X = −1, after which adding more nucleotides does not improve the quality of fit by an appreciable amount, as shown in Figure and S2).

Figure 3. Various fit statistics for the linear regression of segment scores against the mRuby2 fluorescence: One of the ends of the promoter is fixed at −49 and nucleotides are added on the other end towards the TSS. The values of R-squared, Adj. R-squared, and p-value for F-statistic are tabulated in Table S1. Similar plots for Venus and mTurquoise2 fluorescence are given in Fig S2.

Figure 3. Various fit statistics for the linear regression of segment scores against the mRuby2 fluorescence: One of the ends of the promoter is fixed at −49 and nucleotides are added on the other end towards the TSS. The values of R-squared, Adj. R-squared, and p-value for F-statistic are tabulated in Table S1. Similar plots for Venus and mTurquoise2 fluorescence are given in Fig S2.

This saturation is sustained until X = 10, indicating that the quality of fit from −49 to −1 and −49–10 is mostly similar (Figures S4–S6). Since the −49–10 sequence was readily available in the EPD, further analysis focused on this 60 bp segment to ease the integration with the EPD. Figure shows the plot of normalized −49–10 scores and normalized mRuby2 fluorescence along with the best fit model. We see that the residues for this model are randomly distributed around 0, indicating that the errors are uncorrelated, and the quantile-quantile plot shows that errors are normally distributed, thus validating all the assumptions for linear regression. Similar trends were observed using Venus and mTurquoise2 fluorescence as seen in Figures S2 and S3, respectively.

Figure 4. Best fit model for −49–10 segment: (A) Plot of normalized −49–10 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed theoretical quantiles.

Figure 4. Best fit model for −49–10 segment: (A) Plot of normalized −49–10 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed theoretical quantiles.

−9–1 region

Motif analysis shown in Figure showed that the −9–1 region to be the most conserved stretch. Previous work suggests that conserved sequences contribute significantly to the binding specificity of the promoter region (Berg and von Hippel Citation1987). As mentioned earlier, a high binding specificity is indicative of high promoter strength. Consequently, we sought to determine whether this hypothesis holds true for the promoters included in our analysis. The quality of fit observed in this case, however, is extremely poor as seen in the Figure . The p-value of F-statistic is 0.54 for regression using mRuby2 strengths. Therefore, we cannot reject the null hypothesis that there is no relationship between −9–1 score and promoter strength. This demonstrates that the above argument is not a dominant mechanism in defining the promoter strength.

Figure 5. Best fit model for −9–1 segment: (A) Plot of normalized −9–1 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed quantiles.

Figure 5. Best fit model for −9–1 segment: (A) Plot of normalized −9–1 score and normalized mRuby2 fluorescence along with the best fit model. (B) Residues obtained from the best fit model. (C) Quantile-Quantile plot of residues against normally distributed quantiles.

Values of model parameters

We found that the value of intercept C0 in the model is close to zero for the best fit model and the value of slope C1 was between 0.8 and 1.1. Figure shows the best fit values of C0 and C1 for different fluorescence along with their errors. Values of C0 and C1 obtained using mTorquoise2 fluorescence were slightly different from those obtained using Venus or mRuby2 fluorescence. These differences likely stem from the stochastic gene expression and noisy fluorescence signals in the experiments. However, they agree with each other within error bars. The value of C1 using the Decoene et al. (Citation2019) dataset was within the 1-sigma error bars of Venus and mRuby2 fluorescence. The value of C0 differs significantly for this dataset, but we note that this is likely due to different experimental conditions and does not change the relative strengths of other promoters (Figures S7 and S8). The mean values of C0 and C1 are 0.025 and 0.87, respectively.

Figure 6. Best fit values of model parameters: The best fit values of C0 and C1 obtained using different fluorescence data as a proxy for promoter strength. The black dot shows the mean value of C0 and C1 weighted by error bars. The data corresponding to this plot can be found in Table S2.

Figure 6. Best fit values of model parameters: The best fit values of C0 and C1 obtained using different fluorescence data as a proxy for promoter strength. The black dot shows the mean value of C0 and C1 weighted by error bars. The data corresponding to this plot can be found in Table S2.

QPromoters software

We have developed an open-source, free-to-use standalone tool and a website to use our findings to predict the strength of S. cerevisiae promoters. The standalone tool can be found at https://github.com/DevangLiya/QPromoters, and the website can be found at https://qpromoters.com/. Users can either enter the −49–10 sequence of the engineered promoter or can retrieve this sequence directly from EPD by entering the EPDnew ID of the promoter. The tool then outputs the promoter score, promoter score normalized by pTDH3 score, promoter strength using the model described by equation Eq. 1, a plot showing the location of the user's promoter with reference to the 18 characterized promoters, and a histogram showing the location of the user's promoter with reference to all 5117 EPD promoters. An example of the figure returned by the program is shown in Figure .

Figure 7. A sample output from QPromoters application: (A) −49–10 score of the user's promoter is shown as a horizontal line on a plot of scores of the characterized promoters from Lee et al. (Citation2015) (B) Arrow shows the score of the user's promoter in reference to scores of all 5117 S. cerevisiae promoters in EPD.

Figure 7. A sample output from QPromoters application: (A) −49–10 score of the user's promoter is shown as a horizontal line on a plot of scores of the characterized promoters from Lee et al. (Citation2015) (B) Arrow shows the score of the user's promoter in reference to scores of all 5117 S. cerevisiae promoters in EPD.

Conclusion

We analyzed the core promoter region of −49–10, with respect to TSS, to find a simple correlation between the score of this region and the experimental promoter activity. Analysis of the core promoter region revealed that there is a correlation between the promoter score and experimental promoter strength. Particularly, the −49–10 region of the core promoter was seen to be the best predictor of the promoter strength. We also observed a similar quality of fit between the −49 to −1 and −40–10 regions. The biological basis and significance of this sustained quality of fit needs further investigation. In addition to these findings, we have developed an open-source, free-to-use tool to predict the promoter strength of unknown promoters in S. cerevisiae.

Using computational tools to determine the essentiality of genes and strength of gene regulatory elements is of significant use in synthetic biology, as this tool will help in constructing recombinant circuits. This tool would also be helpful in experiments where fine-tuned regulation of gene expression is required and in studies involving transcription kinetics where characterizing promoter strength might be required. Consequently, these in-silico methods can precede and lower the risk of failure in in vivo experiments. Moreover, our web tool is useful in characterizing the strength of existing promoters in the EPD as well as predicting the strength of other engineered S. cerevisiae promoters on the basis of the promoter sequence.

Supplemental material

Supplemental Material

Download Zip (2.8 MB)

Acknowledgments

The authors thank Shubham Kumar Sinha and Swaroopa Nakkeeran from the Indian Institute of Science Education and Research, Mohali, for their comments and suggestions throughout the work. AKJ is supported by Clarendon Fund (http://www.ox.ac.uk/clarendon/about), SKP scholarship, Exeter College (https://www.exeter.ox.ac.uk/wp-content/uploads/2019/09/SKP-2020.pdf) and UKRI-BBSRC grant BB/M011224/1, Oxford Interdisciplinary Bioscience DTP at the University of Oxford. DHL and NMA are supported by the INSPIRE scholarship (https://www.online-inspire.gov.in/). AKJ conceptualized and designed the project. DHL curated the data and worked on the formal analysis to quantify the promoter strength. SS helped organizing and standardizing the datasets. NMA and NB did the promoter landscape analysis and visualizations. ME and SS mined the S. cerevisiae promoters’ fluorescence data from published datasets and resources for validating the tool. MP designed web implementation and provided technical assistance. ME helped in testing the tool and improving the user experience. AKJ, SS, DHL, and ME wrote the manuscript. All the authors have read and approved this manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability

The fluorescence data used in this work are openly available in the following publications: Lee et al. (Citation2015) and Decoene et al. (Citation2019) that issue datasets with DOIs. The standalone tool for predicting the promoter strength is open-source and available at https://github.com/DevangLiya/QPromoters. The online version of this tool can be accessed at https://qpromoters.com/.

Correction Statement

This article was originally published with errors, which have now been corrected in the online version. Please see Correction (http://dx.doi.org/10.1080/26895293.2023.2208404)

Additional information

Funding

The authors did not receive funding from any source for this work and was carried out of scientific interest.

References

  • Akan P, Deloukas P. 2008. DNA sequence and structural properties as predictors of human and mouse promoters. Gene. 410:165–176. doi:10.1016/j.gene.2007.12.011.
  • Berg OG, von Hippel PH. 1987. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. Journal of Molecular Biology. 193:723–743. doi:10.1016/0022-2836(87)90354-8.
  • Bharanikumar R, Premkumar KAR, Palaniappan A. 2018. PromoterPredict: sequence-based modelling of Escherichia coli sigma(70) promoter strength yields logarithmic dependence between promoter strength and sequence. PeerJ. 6:e5862. doi:10.7717/peerj.5862.
  • Bitter GA, Chang KK, Egan KM. 1991. A multi-component upstream activation sequence of the Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase gene promoter. Molecular and General Genetics MGG. 231:22–32. doi:10.1007/BF00293817.
  • Chen X, Gao C, Guo L, Hu G, Luo Q, Liu J, Nielsen J, Liu L. 2018. Dceo biotechnology: tools to design, construct, evaluate, and optimize the metabolic pathway for biosynthesis of chemicals. Chemical Reviews. 118:4–72. doi:10.1021/acs.chemrev.6b00804.
  • Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25:1422–1423. doi:10.1093/bioinformatics/btp163.
  • Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Research. 14:1188–1190. doi:10.1101/gr.849004.
  • Decoene T, De Maeseneire SL, De Mey M. 2019. Modulating transcription through development of semi-synthetic yeast core promoters. PLOS ONE. 14:e0224476. doi:10.1371/journal.pone.0224476.
  • Dreos R, Ambrosini G, Groux R, Cavin Perier R, Bucher P. 2017. The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms. Nucleic Acids Research. 45:D51–D55. doi:10.1093/nar/gkw1069.
  • Feng X, Marchisio MA. 2021. Saccharomyces cerevisiae promoter engineering before and during the synthetic biology era. Biology (Basel). 10(6):504. doi:10.3390/biology10060504.
  • Fischer S, Engstler C, Procopio S, Becker T. 2016. EGFP-based evaluation of temperature inducible native promoters of industrial ale yeast by using a high throughput system. LWT - Food Science and Technology. 68:556–562. doi:10.1016/j.lwt.2015.12.020.
  • Gasser B, Steiger MG, Mattanovich D. 2015. Methanol regulated yeast promoters: production vehicles and toolbox for synthetic biology. Microbial Cell Factories. 14:196. doi:10.1186/s12934-015-0387-1.
  • Hayat M, Gul S, Chong KT. 2020. An intelligent computational model for prediction of promoters and their strength via natural language processing. Chemometrics and Intelligent Laboratory Systems. 202:104034. doi:10.1016/j.chemolab.2020.104034.
  • Hubmann G, Thevelein JM, Nevoigt E. 2014. Natural and modified promoters for tailored metabolic engineering of the yeast Saccharomyces cerevisiae. Methods Molecular Biology. 1152:17–42. doi:10.1007/978-1-4939-0563-8_2.
  • Jiang C, Pugh BF. 2009. Nucleosome positioning and gene regulation: advances through genomics. Nature Reviews Genetics. 10:161–172. doi:10.1038/nrg2522.
  • Kanhere A, Bansal M. 2005. Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Research. 33:3165–3175. doi:10.1093/nar/gki627.
  • Kim S, Lee K, Bae SJ, Hahn JS. 2015. Promoters inducible by aromatic amino acids and γ-aminobutyrate (GABA) for metabolic engineering applications in Saccharomyces cerevisiae. oApplied Microbiology and Biotechnology. 99:2705–2714. doi:10.1007/s00253-014-6303-5.
  • Kotopka BJ, Smolke CD. 2020. Model-driven generation of artificial yeast promoters. Nat Commun. 11:2113. doi:10.1038/s41467-020-15977-4.
  • Latimer LN, et al. 2014. Employing a combinatorial expression approach to characterize xylose utilization in Saccharomyces cerevisiae. Metabolic Engineering. 25:20–29. doi:10.1016/j.ymben.2014.06.002.
  • Lee ME, DeLoache WC, Cervantes B, Dueber JE. 2015. A highly characterized yeast toolkit for modular, multipart assembly. ACS Synthetic Biology. 4:975–986. doi:10.1021/sb500366v.
  • Li H, Shi L, Gao W, Zhang Z, Zhang L, Zhao Y, Wang G. 2022. dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost. Methods. 215–222. doi:10.1016/j.ymeth.2022.01.001.
  • Li J, Zhang Y. 2014. Relationship between promoter sequence and its strength in gene expression. The European Physical Journal E. 37:44. doi:10.1140/epje/i2014-14044-y.
  • Liu R, Liu L, Li X, Liu D, Yuan Y. 2020. Engineering yeast artificial core promoter with designated base motifs. Microbial Cell Factories. 19:38. doi:10.1186/s12934-020-01305-4.
  • Lubliner S, et al. 2015. Core promoter sequence in yeast is a major determinant of expression level. Genome Research. 25:1008–1017. doi:10.1101/gr.188193.114.
  • Lubliner S, Keren L, Segal E. 2013. Sequence features of yeast and human core promoters that are predictive of maximal promoter activity. Nucleic Acids Research. 41:5569–5581. doi:10.1093/nar/gkt256.
  • Machens F, Balazadeh S, Mueller-Roeber B, Messerschmidt K. 2017. Synthetic promoters and transcription factors for heterologous protein expression in Saccharomyces cerevisiae. Frontiers in Bioengineering and Biotechnology. 5:63. doi:10.3389/fbioe.2017.00063.
  • Miele V, Vaillant C, d'Aubenton-Carafa Y, Thermes C, Grange T. 2008. DNA physical properties determine nucleosome occupancy from yeast to fly. Nucleic Acids Research. 36:3746–3756. doi:10.1093/nar/gkn262.
  • Nevoigt, E. 2008. Progress in metabolic engineering of Saccharomyces cerevisiae. Microbiology and Molecular Biology Reviews. 72:379–412. doi:10.1128/MMBR.00025-07.
  • Ottoz DSM, Rudolf F. 2018. Constitutive and regulated promoters in yeast: how to design and make use of promoters in S. cerevisiae. Synthetic Biology. 107–130. doi:10.1002/9783527688104.ch6.
  • Portela RMC, Vogl T, Kniely C, Fischer JE, Oliveira R, Glieder A. 2017. Synthetic core promoters as universal parts for fine-tuning expression in different yeast species. ACS Synthetic Biology. 6:471–484. doi:10.1021/acssynbio.6b00178.
  • Rajkumar AS, Liu G, Bergenholm D, Arsovska D, Kristensen M, Nielsen J, Jensen MK, Keasling JD. 2016. Engineering of synthetic, stress-responsive yeast promoters. Nucleic Acids Research. 44:e136. doi:10.1093/nar/gkw553.
  • Redden H, Morse N, Alper HS. 2015. Editorial: Yeast synthetic biology: new tools to unlock cellular function. FEMS Yeast Research. 15:1–1. doi:10.1111/1567-1364.12188.
  • Rhodius Virgil A, Mutalik Vivek K. 2010. Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, σE. Proceedings of the National Academy of Sciences. 107:2854–2859. doi:10.1073/pnas.0915066107.
  • Scalcinati G, Knuf C, Partow S, Chen Y, Maury J, Schalk M, Daviet L, Nielsen J, Siewers V. 2012. Dynamic control of gene expression in Saccharomyces cerevisiae engineered for the production of plant sesquitepene α-santalene in a fed-batch mode. Metabolic Engineering. 14:91–103. doi:10.1016/j.ymben.2012.01.007.
  • Seabold S, Perktold J. 2010. Statsmodels: econometric and statistical modeling with python. Proceedings of the 9th Python in Science Conference 2010.
  • Stewart AJ, Hannenhalli S, Plotkin JB. 2012. Why transcription factor binding sites are ten nucleotides long. Genetics. 192:973–985. doi:10.1534/genetics.112.143370.
  • Struhl K, Segal E. 2013. Determinants of nucleosome positioning. Nature Structural & Molecular Biology. 20:267–273. doi:10.1038/nsmb.2506.
  • Tang H, Wu Y, Deng J, Chen N, Zheng Z, Wei Y, Luo X, Keasling JD, et al. 2020. Promoter architecture and promoter engineering in Saccharomyces cerevisiae. Metabolites. 10:320. doi:10.3390/metabo10080320.
  • Weinhandl K, Winkler M, Glieder A, Camattari A. 2014. Carbon source dependent promoters in yeasts. Microbial Cell Factories. 13:5. doi:10.1186/1475-2859-13-5.
  • West RW Jr., Yocum RR, Ptashne M. 1984. Saccharomyces cerevisiae GAL1-GAL10 divergent promoter region: location and function of the upstream activating sequence UASG. Molecular and Cellular Biology. 4:2467–2478. doi:10.1128/mcb.4.11.2467-2478.1984.
  • Workman JL. 2006. Nucleosome displacement in transcription. Genes & Development. 20:2009–2017. doi:10.1101/gad.1435706.
  • Yada T, et al. 2011. Linear regression models predicting strength of transcriptional activity of promoters. Genome Informatics. 25:53–60. doi:10.11234/gi.25.53.
  • Zhao M, Yuan Z, Wu L, Zhou S, Deng Y. 2022. Precise prediction of promoter strength based on a de novo synthetic promoter library coupled with machine learning. ACS Synthetic Biology. 11:92–102. doi:10.1021/acssynbio.1c00117.
  • Zhao M, Zhou S, Wu L, Deng Y. 2020. Model-driven promoter strength prediction based on a fine-tuned synthetic promoter library in Escherichia coli. bioRxiv. doi:10.1101/2020.06.25.170365.
  • Zhou Y, Li G, Dong J, Xing X-h, Dai J, Zhang C. 2018. MiYA, an efficient machine-learning workflow in conjunction with the YeastFab assembly strategy for combinatorial optimization of heterologous metabolic pathways in Saccharomyces cerevisiae. Metabolic Engineering. 47:294–302. doi:10.1016/j.ymben.2018.03.020.