2,002
Views
2
CrossRef citations to date
0
Altmetric
Point-of-View

Transcription pausing: biological significance of thermal fluctuations biased by repetitive genomic sequences

&
Pages 196-203 | Received 05 Jul 2017, Accepted 11 Oct 2017, Published online: 01 Dec 2017

ABSTRACT

Transcription of DNA by RNA polymerase (RNAP) takes place in a cell environment dominated by thermal fluctuations. How are transcription reactions including initiation, elongation, and termination on genomic DNA so well-controlled during such fluctuations? A recent statistical mechanical approach using high-throughput sequencing data reveals that repetitive DNA sequence elements embedded into a genomic sequence provide the key mechanism to functionally bias the fluctuations of transcription elongation complexes. In particular, during elongation pausing, such repetitive sequence elements can increase the magnitude of one-dimensional diffusion of the RNAP enzyme on the DNA upstream of the pausing site, generating a large variation in the dwell times of RNAP pausing under the control of these genomic signals.

Introduction of statistical approach: why is it necessary?

When one characterizes the function of a biomolecular complex, determining a single particular energy state or a single particular structure of the complex does not always tell us about its function. This is because in the context of different cellular compartments with sizes ranging from tens to hundreds of nanometers, fluctuations of energy (or a structure) are not negligible but are essential in determining regulatory mechanisms [Citation1].

The term fluctuations that we are using throughout this review means stochastic (i.e. random) variations of microscopic parameters of a biomolecular system due to the presence of thermal noise. Thermal noise is usually defined as random or nearly random, diffusive motion of molecules (e.g. water molecules, small organic molecules, lipids, peptides, proteins, nucleic acids, and their complexes etc.) constituting a living cell [Citation1]. For example, due to the presence of thermal noise, a thermally fluctuating biomolecular complex maintains conformational heterogeneity, i.e. adopts multiple conformational states or structures rather than a single structure. In this review, we use the term microscopic state in order to describe a particular conformational state of a biomolecular complex. Such conformational diversity represents a biomolecular complex's dynamic nature, which is necessary for performing its functions, which might include catalysis of chemical reactions, or providing a net directional motion along DNA. Our review stresses the key point that within a cell environment, thermal fluctuations are not entirely random but rather biased. In order to define the term biased thermal fluctuations, let us consider an example of a transcription factor searching for its specific binding site along the genomic DNA molecule [Citation2]. In this case, diffusion of a transcription factor on DNA is not entirely random, but it is rather affected by the DNA sequence in a highly non-random way [Citation3]. Such a non-random influence on random fluctuations of biomolecular complexes is termed here biased fluctuations. This review discusses the effect of biased thermal fluctuations on the process of DNA transcription by RNA polymerase (RNAP), especially on the pausing of the RNAP elongation complex.

The use of high-throughput sequencing (HT-seq) enables us to analyze tens of thousands of nucleic-acid-RNAP complexes in a single experiment [Citation4]. This approach provides a comprehensive snapshot on the entire ensemble of different microscopic states (i.e. microscopic heterogeneity) of the active transcription elongation complex, consisting of DNA, RNA-DNA hybrid, and RNAP. Such data enable us to evaluate the effect of microscopic heterogeneity and DNA sequence features on the enzymatic function of RNAP. In particular, we have recently investigated the transcription elongation complex genome-wide in E. coli [Citation4,Citation5]. Our HT-seq data identify nascent RNA transcripts whose 3’ ends are associated with RNAP during elongation and provide the probability distribution of multiple states of RNAP elongation complexes. This data enable us to identify paused RNAPs at single base-pair resolution along the entire genomic DNA sequence [Citation4,Citation6].

Concepts of statistical mechanics can provide a theoretical framework for handling the HT-seq data containing the information on microscopic heterogeneity of a biomolecular complex. In particular, statistical mechanics enables us to connect the entire ensemble of different microscopic states to macroscopic quantities, such as the average RNAP density in a particular genomic coordinate, assuming thermodynamic equilibrium [Citation5]. Our statistical mechanics approach stresses the key fact that the measured average RNAP density at a given genomic location depends on the DNA context surrounding this location [Citation5]. On the other hand, concepts of chemical kinetics and thermodynamics, which have been historically applied for modeling biomolecular reactions, need to assume homogeneity in the system [Citation7]. Therefore, such approaches would be able to interpret biological mechanisms involving microscopic heterogeneity only when connected with statistical mechanical concepts. Although the importance of thermal fluctuation and microscopic heterogeneity on molecular mechanisms of many biological reactions have been recognized [Citation1], to the best of our knowledge, statistical mechanical concepts have not been applied for understanding the mechanism of RNAP pausing during transcription elongation.

In order to tackle this issue, we have developed the concept of non-consensus protein-DNA binding [Citation3]. This concept allows us to describe microscopic heterogeneity stemming from thermal fluctuations as entropy-dominated free energy, which strongly depends on repetitive DNA sequence elements associated with a protein. Such a statistical fluctuation approach allows us to achieve a quantitative, probabilistic, differentiation between statistical ensembles of the two different groups of DNA sequences with high and low affinities to a specific target protein. In particular, one group of DNA sequences with high affinity to a specific protein has extensive repetitive DNA sequence elements (characterized as a relatively low average free energy of non-consensus protein-DNA binding); another group of DNA sequences with low affinity to the same protein has no repetitive sequence elements (characterized as a relatively high average free energy) [Citation5]. We have shown the significance of multiple functional conformational states of a biomolecular system, which contrasts to the commonly adopted view that thermal fluctuations destabilize specific, structural states of a biomolecular system [Citation5].

Biology of transcription pausing

One may be comfortable when riding on a train, even though the train moves very fast. This is because the movement of a train is a homogeneous rail tracking, where every step of the movement is exactly the same (relative to the size of a train). On the other hand, one would be extremely uncomfortable when riding on an RNAP transcribing RNA along DNA, because during elongation, the net speed of RNAP's forward movement depends on DNA sequence heterogeneity, i.e. the speed of the RNAP movement per base is not constant but it is rather characterized by some probabilistic distribution. We have defined such RNAP movement as heterogeneous DNA tracking [Citation8]. At certain positions, the RNAP moves slowly with a dwell time per base of ∼1 sec or longer, and this is defined as transcription pausing [Citation9–11].

Transcription pausing is a widespread phenomenon [Citation8,Citation12,Citation13]. In promoter-proximal regions, RNAP pausing is observed in 10–40% of Drosophila genes [Citation14–16] and 10–20% of E. coli genes [Citation17]. There is a long research history in studying the phenomena of transcription pausing in both eukaryotic and prokaryotic cells. Those studies are also closely linked with the development of high-throughput techniques analyzing the nascent 3’ RNA ends of transcripts associated with transcribing RNAP [Citation12,Citation18]. Recent HT-seq studies from E. coli to human cells reveal an increased density of pausing not only in promoter-proximal regions but also at several sites through the length of a gene, both in sense and anti-sense orientations [Citation4,Citation6,Citation19,Citation20].

Accumulating lines of evidence suggest that one of the major roles for RNAP pausing in promoter-proximal regions is to maintain active promoters in highly expressed genes. In eukaryotes, when RNAPII stably pauses just downstream of a promoter, it keeps chromatin open [Citation13]. Thus a dynamic equilibrium between open and closed, bistable chromatin states in the vicinity of the promoter is biased toward an open state, which can be essential for providing sufficient time to associate regulatory transcription factors with the promoter and with the paused RNAPII during transcription initiation [Citation21]. Another important role of RNAP pausing, which is observed outside of promoter-proximal regions, is a kinetic coupling of transcription to other chromosome machines, such as the spliceosome in eukaryotic cells [Citation12,Citation22]. For instance, pausing of RNAPII due to a roadblock by an eleven zinc finger protein affects co-transcriptional alternative splicing in the CD45 gene, and increases exon inclusion frequency [Citation23].

Transcription pausing does not always play positive regulatory roles. One negative consequence of pausing is genome rearrangements, which are highly toxic to cells [Citation24–26]. Pausing can be long-lived when it is stabilized by backtracking of RNAP. Backtracking is enhanced when RNAP misincorporates a nucleotide that is noncomplementary to the template DNA, or when encountering protein road blocks on DNA [Citation8,Citation27,Citation28]. During backtracking, the 3’ end of the nascent RNA is disengaged from the active site of RNAP. Such an event profoundly stabilizes the paused state [Citation29,Citation30]. This stably paused complex will collide with other proteins that are moving along the genomic DNA such as the DNA replication complex [Citation31,Citation32]. Collisions of paused transcription complexes with replisomes are frequent occurrences that cause DNA damage usually via the formation of R-loops, molecular structures composed of an RNA-DNA hybrid and a looped-out single DNA strand [Citation24,Citation31,Citation32]. The R-loop is sensitive to DNA breaks and mutations in the single-stranded DNA region. Surprisingly, recent sequencing studies estimate that R-loops occupy 5% or more of the length of the genome in some cell types [Citation33–35]. Accumulation of R-loops leads to human diseases including cancer developpment and neurodegeneration [Citation36]. Maintenance of genome integrity is a crucial issue for both prokaryotic and eukaryotic cells, and hence, both types of cells are equipped with several different co-transcriptional mechanisms that minimize transcription-replication conflicts [Citation37].

In prokaryotes, a translating ribosome on the nascent transcript behind RNAP may prevent many of the pervasive RNAP pauses that are stabilized by backtracking [Citation38]. In particlular, the leading ribosome on the nascent RNA behind RNAP may always prevent RNAP [Citation39] from moving backwards on the RNA. In the absence of ribosome and translation coupling with transcription, RNAP can backtrack for a long distance. Despite this, recent genome-wide studies identified considerable backtracked pauses [Citation4] and likely resultant R-loops formed within the coding regions of genes of prokaryotes [Citation33] suggest that the association of translating ribosomes with the 5’ end of nascent RNA may not be always sufficient to prevent spontaneous RNAP backtracking.

What factors discriminate between physiologically-relevant pausing and physiologically-irrelevant pausing?

Transcription pausing stabilized by RNAP backtracking appears to have physiological roles for specific regulatory purposes [Citation40,Citation41]. However, pausing, if it ubiquitously occurs throughout transcribed regions in the genome, could be harmful since it provides a source of transcription-replication conflicts [Citation37]. What factors affect pausing in a cell? What factors help to suppress excessive pausing events from occurring in a cell?

Recently, we and other research groups independently developed a method allowing the genome-wide analysis of transcription pausing, using E. coli cells as a model system. Using their methods, we and others identified a highly conserved pause-inducing element (PIE) [Citation4,Citation10,Citation42]. The conserved motifs of PIE represented as GNNNNNNTGCG are spread out along the entire length of the RNA-DNA hybrid () [Citation4]. It has been shown that the PIE sequence can impede forward translocation of RNAP, as well as the following NTP addition step to cause pausing with ∼1 sec or longer duration. These long pauses are detectable by HT-seq experiments [Citation4]. However, deeper analysis demonstrated that the presence of the PIE sequence alone is not responsible for RNAP pausing on the E. coli chromosome; an additional genome-wide mechanism is necessary to cause pausing [Citation5]. Finally, we then showed that thermal fluctuations biased by repetitive DNA sequences constitute the global mechanism determining the fate of RNAP pausing [Citation5].

Figure 1. Transcription pausing at PIEs is determined by biased thermal fluctuations. The average free energy per bp <f> is computed using statistical mechanical modeling in the interval (–80,80) around the pause sites at true and false PIEs, respectively, as well as for the group of control sequences (see [Ref. Citation5] for p-values calculations and for other computational details). On the top, the 11-nt PIE of non-template DNA strand is depicted as the RNA (blue), DNA (gray) and RNAP (pink oval). The 3' RNA end at the active site (white rectangle) corresponds to the pause site.

Figure 1. Transcription pausing at PIEs is determined by biased thermal fluctuations. The average free energy per bp <f> is computed using statistical mechanical modeling in the interval (–80,80) around the pause sites at true and false PIEs, respectively, as well as for the group of control sequences (see [Ref. Citation5] for p-values calculations and for other computational details). On the top, the 11-nt PIE of non-template DNA strand is depicted as the RNA (blue), DNA (gray) and RNAP (pink oval). The 3' RNA end at the active site (white rectangle) corresponds to the pause site.

In order to investigate the PIE-independent mechanism for pausing in E. coli, we defined “false” PIEs in such a way that the information content [Citation43] (i.e. the index of the base conservation) of DNA sequence is as high as “true” PIEs, however, no reliable pausing was experimentally detected for false PIEs [Citation5]. Therefore, the specific, consensus motif characterizing the true PIE group is the same as the corresponding consensus motif for the false PIE group. This is despite the fact that pausing events are detected in the former group but not in the latter group. Surprisingly, we showed that true PIEs constitute only 3%, and the rest 97% of DNA sequences constitute false PIEs. Most importantly, we identified that DNA sequences flanking true PIEs contain more repetitive sequence elements than the sequences flanking the false PIEs even though there is no significant difference in the average GC content of those flanking sequences between true and false PIEs [Citation5].

Control of transcription pausing by repetitive genomic sequences

How do repetitive sequences enhance pausing? An intuitive explanation follows. Similar to our past work modeling protein-DNA interactions, here we assume that the binding energy of RNAP to DNA per base is dominated by thermal fluctuations [Citation3,Citation44–49]. Thus, even though RNAP may pause at PIE, thermal, diffusive motion (i.e. sliding) of the paused complex on DNA determines the overall stability of the paused state. In other words, the stability of the paused state is determined by the overall free energy of the elongation complex (dominated by entropy) rather than by the binding energy of a unique “static” state. When DNA sequences located upstream of consensus PIEs are enriched with certain repetitive elements, the number of possible conformational states of a paused complex will increase [Citation5]. As a result of this effect, statistically on average, upon pausing, RNAP has a higher probability to diffuse backwards and upstream of such PIEs towards DNA sequences enriched in repetitive elements (). We use the term diffusional backtracking stabilized by biased thermal fluctuations in order to describe this effect. This intuitive explanation is indeed validated by our quantitative statistical mechanical modeling [Citation44]. In particular, in order to estimate the statistical magnitude of non-consensus RNAP binding to DNA along the entire E. coli genome, we computed the free energies of non-consensus RNAP-DNA binding for three groups of DNA sequences, those surrounding true PIEs, false PIEs and a control non-PIE group (). Both the experimentally determined true PIEs and the predicted false PIEs consist of very similar 11-nt-long consensus motifs represented by GNNNNNNTGCG, located in the center of the sequences (). The control group consisting of 20,000 sequences was selected from the E. coli genome at randomly chosen locations, by using a position weight matrix of the 11-nt-long true PIEs () [Citation5]. Thus, DNA sequences belonging to this control group lack the 11-nt-long consensus motifs. Strikingly, our analysis reveals that the DNA context upstream of true PIEs is special and characterized by a significantly lower average free energy of non-consensus binding as compared to false PIE regions (). We interpreted that such an asymmetric profile and lower magnitude of the non-consensus free energy leads to enhanced RNAP sliding on DNA upstream of PIE when RNAP temporarily pauses during elongation ( and ), We also interpreted that the higher magnitude of the non-consensus free energy surrounding false PIE prevents such RNAP sliding ( and ). These notions were verified by in vitro experiments [Citation5].

Figure 2. Mechanistic model for controlling RNAP pausing by biased thermal fluctuations on repetitive genomic sequences, and its biological significance. During transcription elongation, the net forward-biased RNAP motion is coupled with an NTP-binding to the RNAP active site, causing the elongation reaction. Upon transient pausing at PIE signal, such an NTP binding is blocked, allowing RNAP diffusion upstream of PIE along DNA. RNAP pausing is stabilized when blocking of elongation by the PIE signal is combined with the RNAP diffusion biased by repetitive DNA sequence elements upstream of PIE. In particular, if DNA regions upstream of PIEs contain repetitive sequence elements, then paused complexes are greatly influenced by thermal fluctuations, on average, i.e. diffusional backtracking of RNAP is greatly enhanced. If, however, the sequences surrounding PIEs are non-repetitive, the fluctuations of paused complexes are limited on average, and thus such paused complexes have a higher probability to remain in an initial paused state without diffusional backtracking, facilitating a rapid resumption of elongation. Since PIEs containing such repetitive elements are enriched in transcriptional regulatory regions on E. coli genome [Citation4,Citation5], this mechanism allows the RNAP to “statistically” discriminate physiologically-relevant pausing from more abundant physiologically-irrelevant pausing, possibly leading to transcription-replication conflicts and the resultant genome instability [Citation37]. The box in the bottom of the panel illustrates our statistical mechanics modeling approach. The probability of RNAP to diffuse upstream of PIE site is controlled by the Boltzmann distribution, depending on the non-consensus free energy. If DNA regions upstream of PIE are enriched in repetitive sequence elements, the corresponding free energy is lower, and thus the probability for the diffusional backtracking is higher (see chapter 6 of [Ref. Citation1]).

Figure 2. Mechanistic model for controlling RNAP pausing by biased thermal fluctuations on repetitive genomic sequences, and its biological significance. During transcription elongation, the net forward-biased RNAP motion is coupled with an NTP-binding to the RNAP active site, causing the elongation reaction. Upon transient pausing at PIE signal, such an NTP binding is blocked, allowing RNAP diffusion upstream of PIE along DNA. RNAP pausing is stabilized when blocking of elongation by the PIE signal is combined with the RNAP diffusion biased by repetitive DNA sequence elements upstream of PIE. In particular, if DNA regions upstream of PIEs contain repetitive sequence elements, then paused complexes are greatly influenced by thermal fluctuations, on average, i.e. diffusional backtracking of RNAP is greatly enhanced. If, however, the sequences surrounding PIEs are non-repetitive, the fluctuations of paused complexes are limited on average, and thus such paused complexes have a higher probability to remain in an initial paused state without diffusional backtracking, facilitating a rapid resumption of elongation. Since PIEs containing such repetitive elements are enriched in transcriptional regulatory regions on E. coli genome [Citation4,Citation5], this mechanism allows the RNAP to “statistically” discriminate physiologically-relevant pausing from more abundant physiologically-irrelevant pausing, possibly leading to transcription-replication conflicts and the resultant genome instability [Citation37]. The box in the bottom of the panel illustrates our statistical mechanics modeling approach. The probability of RNAP to diffuse upstream of PIE site is controlled by the Boltzmann distribution, depending on the non-consensus free energy. If DNA regions upstream of PIE are enriched in repetitive sequence elements, the corresponding free energy is lower, and thus the probability for the diffusional backtracking is higher (see chapter 6 of [Ref. Citation1]).

Biological significance of Brownian-ratchet type mechanism controlling pausing

It is notable that the statistical control of transcription pausing by biased thermal fluctuations on repetitive genomic sequences belongs to a type of Brownian ratchet mechanism of transcription elongation [Citation7,Citation8,Citation50,Citation51]. According to the original view, during elongation, RNAP can diffuse forward and backward on DNA but the net forward motion on DNA is generated and biased by cognate NTP binding to the RNAP active site of the elongation complex [Citation52]. This mechanism prevents spontaneous backward motion by allowing elongation to the next cycle through condensation of the cognate NTP to the 3’ end of the transcript [Citation7,Citation8,Citation50]. Based on a molecular structure of the elongation complex [Citation53] combined with biochemical and single-molecule measurements [Citation52,Citation54], this mechanism assumes two rapid thermal fluctuations during elongation: (i) translocation fluctuation of the RNA-DNA hybrid and (ii) conformational fluctuation of the RNAP active site that includes the two protein motifs termed bridge helix and the trigger loop [Citation53,Citation55].

It is important to note that this proposed ratchet mechanism describes a pause-free elongation (homogeneous tracking). However, it has not sufficiently described sequence-specific pausing due to heterogeneous DNA tracking [Citation8]. Our findings thus extend the Brownian-ratchet mechanism of transcription elongation by incorporating the entropy-dominated mechanism controlling RNAP pausing (). In particular, upon pausing at any PIE sequence, the cognate NTP binding to the RNAP active site is inhibited, allowing time for RNAP to diffuse backward on DNA. Repetitive sequence elements upstream of PIE can increase the number of the paused complex conformations induced by thermal fluctuations, and therefore enhance diffusional backtracking of RNAP. On the other hand, non-repetitive sequences do not significantly increase the number of conformations of the complex during pausing, which leaves RNAP in a non-backtracked conformation, allowing RNAP to resume elongation rapidly (). This mechanism determines (statistically) the fate of pausing at PIE using Brownian motion biased by repetitive DNA sequences ().

We suggest therefore that the positive (i.e. enrichment of repetitive DNA sequence elements upstream of consensus PIE) and negative (i.e. enrichment of non-repetitive DNA sequence elements upstream of consensus PIE) genomic-design strategies [Citation47] for controlling RNAP pausing represent a physiologically crucial mechanism for E. coli cells (). This strategy prevents pervasive pausing at all such PIE sequences. If it occurred everywhere it would lead to extensive DNA damage via transcription-replication collisions. By enhancing pausing at defined places where repetitive sequences are kept, the cell may maintain mechanisms for regulating pausing and pausing in turn can regulate different stages of gene transcription (i.e. initiation, elongation and termination) and the related biological processes such as transcript degradation. Indeed, our HT-seq measurement of pausing throughout the E. coli genome showed that true PIEs are enriched in regulatory regions including 5’ untranslated regions, anti-sense transcripts-coding regions, and intrinsic transcription terminators () [Citation4,Citation5]. Future characterization of false PIEs in the context of transcription-replication conflict and the resultant R-loop formations will help to further elucidate biological significance of the negative design strategy for transcription pausing.

Abbreviations

bp=

base pair

nt=

nucleotide

NTP=

ribonucleoside triphosphate

HT-seq=

high-throughput sequencing

PIE=

pause-inducing element

RNAP=

RNA polymerase

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Acknowledgments

We thank Donald L. Court for critical reading of the manuscript.

Additional information

Funding

This work was supported by the Japan Society for the Promotion of Science (JSPS) (number 16H06692).

References