140
Views
0
CrossRef citations to date
0
Altmetric
Research paper

Circular at the very beginning: on the initial genomes in the RNA world

, , & ORCID Icon
Pages 17-31 | Accepted 10 Jul 2024, Published online: 17 Jul 2024

ABSTRACT

It is likely that an RNA world existed in early life, when RNA played both the roles of the genome and functional molecules, thereby undergoing Darwinian evolution. However, even with only one type of polymer, it seems quite necessary to introduce a labour division concerning these two roles because folding is required for functional molecules (ribozymes) but unfavourable for the genome (as a template in replication). Notably, while ribozymes tend to have adopted a linear form for folding without constraints, a circular form, which might have been topologically hindered in folding, seems more suitable for an RNA template. Another advantage of involving a circular genome could have been to resist RNA’s end-degradation. Here, we explore the scenario of a circular RNA genome plus linear ribozyme(s) at the precellular stage of the RNA world through computer modelling. The results suggest that a one-gene scene could have been ‘maintained’, albeit with rather a low efficiency for the circular genome to produce the ribozyme, which required precise chain-break or chain-synthesis. This strict requirement may have been relieved by introducing a ‘noncoding’ sequence into the genome, which had the potential to derive a second gene through mutation. A two-gene scene may have ‘run well’ with the two corresponding ribozymes promoting the replication of the circular genome from different respects. Circular genomes with more genes might have arisen later in RNA-based protocells. Therefore, circular genomes, which are common in the modern living world, may have had their ‘root’ at the very beginning of life.

1. Introduction

It is now widely accepted that in the early evolution of life, there was a stage called the ‘RNA world’ [Citation1–4], during which RNA served as the carrier of both genes and functions. Indeed, to guarantee the capability of undergoing Darwinian evolution, which is essential for a life form [Citation5–7], the genetic and functional features are both indispensable; on the other hand, it seems quite unlikely that two different types of polymers – one for heredity and the other for functionality (with a ‘information linkage’ in between), like DNA and proteins in modern life – could have emerged simultaneously at the start. Due to its logical reasonability as well as accumulating evidence supporting the idea [Citation3,Citation4,Citation8], the RNA world scenario appears quite convincing and is now guiding many studies in the field of the origin of life, both from experimental and theoretical aspects [Citation9–12].

Moreover, in the context of recent advances in prebiotic synthesis [Citation13], particularly those concerning nucleotides [Citation14,Citation15], it seems likely that the RNA world simply represented the earliest stage of life (disregarding the ‘metabolism first’ hypothesis or similar theories postulating the initial emergence of life in the absence of Darwinian evolution, see Ref [Citation8] for an in-depth comment). Then the question arises: how could the RNA world have started, for example, in ‘a pool of nucleotides’? [Citation9].

Modern life is based on cellular forms (except virus and viroids, which live parasitically, relying on other cellular organisms). However, considering the principle of simplicity in the origin problem (‘the simpler, the more likely to have emerged de novo’), it has been hypothesized that there was a ‘naked’ phase at the beginning of the RNA world [Citation10,Citation16,Citation17]. That is, it is initially RNA molecules themselves that acted as the units of Darwinian evolution (i.e. Darwinian entities), and RNA-based ‘protocells’ emerged later – marking the ‘first major transition’ in the evolutionary history [Citation18–20]. Here, we focused our modelling on the precellular naked phase – therein, how could RNA have played the dual roles to ensure Darwinian evolution?

For an RNA molecule to act as a ribozyme (functional molecule), effective folding to reach an appropriate structure is important. However, to act as a good template in replication (‘genome’ for heredity), folding is unnecessary and folding into a compact form is even rather unfavourable. Then, we asked: though with only one type of polymer, could some form of labour division be taken among the RNA molecules? In fact, in modern life, enzymes and ribozymes typically adopt a linear form as a polymer, whereas circular DNA/RNA genomes are ubiquitous (e.g. bacteria, archaea, mitochondria, chloroplasts, viroids and some virus). Indeed, there is evidence that circularization of a linear RNA will interfere with its functional folding [Citation21] – for example, the hammerhead ribozyme within the circular genome of a viroid can only function in a linear form when participating the rolling circle replication of the genome [Citation22] (i.e. when residing in the circular genome, the ribozyme cannot fold into a ‘correct’ structure). Beside the advantage of potentially acting as a better template, a circular form can protect the genome from end-degradation, especially considering the chemical instability of RNA molecules (see Discussion for a detailed explanation).

As another clue indicating that we should consider the circular form of RNA at this early phase, circularization of linear RNAs seems actually inevitable. It has long been appreciated that a polymer might undergo intramolecular circularization when grown to sufficient length, so that its two ends can come within close proximity [Citation23]. For RNA, such intramolecular end-to-end ligation turns out to be rather ready, as a ‘parallel’ reaction of the intermolecular end-to-end ligation [Citation24,Citation25]. In other words, circular RNA should be common in the early phase and may have inevitably taken part in the ‘primordial’ Darwinian evolution occurring then.

According to the implication of simplicity on the origin problem, we may assume that the RNA world started with one functional ribozyme. However, when we envisioned a corresponding one-gene circular genome, the thing became somewhat ‘delicate’. How can the linear ribozyme be ‘produced’ from the circular genome? shows the two possible ways: one is the ‘accurate breaking’ of the ‘sense’ chain; the other is the ‘accurate synthesis’ of the ribozyme on the ‘antisense’ chain (including the starting of synthesis at correct position and the timely dropping from the template before the ultimate completion of circularization). Both of them appear quite difficult in such an early scene of evolution. In fact, for this early scene, in a context of ‘random breaking’ and ‘random synthesis’, most of the products derived from the circular genome would not be the ribozyme (thus ‘useless’). Then, could such an inefficient mode to generate the ribozyme be effective enough to support the spread of the one-gene genome? With this question in mind, we set out on our modelling work.

2. Results

2.1. About the model

We conducted the computer simulation using a Monte Carlo model similar to those used in our previous work concerning the naked phase of the RNA world [Citation26–29], but with consideration of circular RNA. The system is a two-dimensional N × N square grid (with toroidal topology to avoid edge effects). The assumption of a two-dimensional system is in part for simplicity and in part with consideration of the prebiotic milieus like mineral surfaces (with dispersal limitation) for the naked phase of the RNA world, as suggested by quite a lot of studies [Citation30–33]. Molecules are distributed within the grid rooms, including nucleotide precursors, nucleotides, and RNA. In each time step (Monte Carlo step), relevant events may occur to the molecules with defined probabilities. Nucleotide precursors may transform into nucleotides (randomly as A, G, C, or U). Nucleotides may assemble into linear RNAs via random ligation. A linear RNA may turn into a circular one via end-to-end ligation. Both linear RNAs and circular RNAs may conduct template-directed replication by using nucleotides and oligomers as substrates, except that linear RNAs have a lower templating efficiency. A circular RNA may turn into a linear one via the breaking of a certain phosphodiester bond. A linear RNA may also break into smaller fragments. A nucleotide may decay into a nucleotide precursor. A nucleotide residue at the end of a linear RNA may also decay into a nucleotide precursor. A linear RNA molecule containing a characteristic sequence is assumed to have a special function (i.e. may act as a ribozyme). Molecules may also move into an adjacent grid room. Refer to for descriptions of the associated parameters and see Methods for detailed explanations.

Table 1. Parameters used in the computer simulation.

In the beginning of a simulation, a certain number of nucleotide precursors are introduced into the system. As time goes on, nucleotides and RNA would emerge. On the other hand, the degradation of RNA and nucleotides may end in nucleotide precursors. In summary, the total materials of the system are constant. RNA sequences within the system are competing for the limited materials. Potentially, an RNA ‘species’ – with a specific sequence may spread (become thriving) in the system by virtue of its function (favouring its replication). Notably, with a ‘information resolution’ at the nucleotide level, the model is intrinsically suitable for the investigation of the early Darwinian evolution, which relied essentially on the sequence-function connection of RNA.

2.2. The spread of a one-gene circular genome

An RNA species catalysing the template-directed synthesis (thus the RNA replication) has long been suggested to have been the first ribozyme emerging in the RNA world, usually referred to as an ‘RNA replicase’ [Citation9,Citation34–37] (here ‘REP’ for short). Although experimental studies have not achieved such a ribozyme in a complete sense, we can see a constant progress towards this direction over the past three decades [Citation38–44]. Computer simulation has demonstrated that RNA-like ‘replicators’ could spread at the naked phase by virtue of replicase-like function (via favouring its own replication due to limited dispersal) [Citation45]. A previous simulation study of ours suggested that the RNA replicase emerging first may have been a simple template-directed ligase, i.e. a ribozyme catalysing the ligation of substrates (nucleotides and oligomers) aligned adjacently on the template [Citation28]. Therefore, here we considered such a ribozyme and its corresponding gene in our investigation of the one-gene circular genome.

First, we wanted to know whether a circular RNA comprising the sequence of REP could spread in our model system, under the restriction of inefficient ribozyme-production as mentioned above (). After introducing nucleotide precursors in quantity of TNPB (see for descriptions of parameters) in the beginning, at step 1 × 104, 50 linear RNA molecules with the sequence of REP and 50 linear control RNA molecules (without function) are inoculated into system. Then, circular RNA with the REP sequence appears (via intramolecular end-to-end ligation, with the probability PEL) and subsequently spreads in the system, whereas the control species cannot spread (see for a typical case). Notably, the circular RNA with the REP sequence (blue circles) is roughly equal in number to its complement (light blue circles), and the linear RNA with the REP sequence (the blue line) is also roughly equal in number to its complement (the light blue line). Such a feature, actually present in all simulation cases here, is in accord with the underlying mechanism of template-directed replication based on base-pairing, which is appropriately reflected in our model with a ‘information resolution’ at the nucleotide level.

Figure 1. Circular genomes and their ways to generate linear ribozymes in the beginning of the RNA world. (A) A ‘compact’ one-gene genome. Blue lines represent the ribozyme sequence and light-blue lines represent the corresponding complementary sequence. The ribozyme must be generated through accurate breaking (the arrow) of the sense chain or through ‘accurate RNA synthesis’ on the anti-sense chain template – i.e. starting at the correct position and timely dissociation of the produced linear sense chain before its circularization (the triangle). (B) A one-gene genome including a ‘noncoding’ sequence. Thin black lines represent the noncoding sequence, and thin grey lines represent its complement. The ribozyme could be generated by any chain-breaking events within the non-coding sequence (the arrows), or by any instances of dissociation of the produced linear sense chain (providing it already includes the ribozyme sequence) before its complete circularization. (C) A two-gene genome. Red lines represent the sequence of the second ribozyme, and light-red lines represent its complement. One ribozyme may be generated by breaking the circular sense chain at the side of the other ribozyme’s region, or the dissociation of the produced linear sense chain comprising a complete sequence of the ribozyme.

Figure 1. Circular genomes and their ways to generate linear ribozymes in the beginning of the RNA world. (A) A ‘compact’ one-gene genome. Blue lines represent the ribozyme sequence and light-blue lines represent the corresponding complementary sequence. The ribozyme must be generated through accurate breaking (the arrow) of the sense chain or through ‘accurate RNA synthesis’ on the anti-sense chain template – i.e. starting at the correct position and timely dissociation of the produced linear sense chain before its circularization (the triangle). (B) A one-gene genome including a ‘noncoding’ sequence. Thin black lines represent the noncoding sequence, and thin grey lines represent its complement. The ribozyme could be generated by any chain-breaking events within the non-coding sequence (the arrows), or by any instances of dissociation of the produced linear sense chain (providing it already includes the ribozyme sequence) before its complete circularization. (C) A two-gene genome. Red lines represent the sequence of the second ribozyme, and light-red lines represent its complement. One ribozyme may be generated by breaking the circular sense chain at the side of the other ribozyme’s region, or the dissociation of the produced linear sense chain comprising a complete sequence of the ribozyme.

Here, to avoid the influence of accidental RNA-degradation events, we inoculated a number of REP molecules initially to investigate the evolutionary dynamics (i.e. the evolutionary process over time). In fact, the outcome, i.e. the spread of the circular RNA containing the REP sequence, already suggests that such a kind of RNA could have emerged de novo from the proposed ‘nucleotide pool’. In the reality, the simultaneous appearance of so many REP molecules was obviously impossible. However, one molecule of the REP may have had chances to appear repeatedly, especially considering the great time-scale regarding the origin of life. For example, shows a modelling case that one linear REP molecule and one linear control molecule are inoculated into the system every 1 × 104 steps, and the circular REP genome eventually spreads in the system (after the spread, the periodic inoculation stops).

Figure 2. The spread of a circular genome containing only the REP gene. Legends: cir_rep – circular RNA containing the REP sequence; cir_repcom – circular RNA containing the complementary sequence of the REP; lin_rep – linear RNA containing the REP sequence; lin_repcom – linear RNA containing the complementary sequence of the REP; cir_ct – circular RNA containing the control sequence; cir_ctcom – circular RNA containing the complementary sequence of the control. (A) At step 1 × 104, 50 linear RNA molecules with the REP sequence and the same number of linear RNA molecules with the control sequence are inoculated into the system (at locations which are randomly chosen, the same below). (B) One linear RNA molecule with the REP sequence and one linear RNA molecule with the control sequence are inoculated into the system every 1 × 104 steps, until step 1.23 × 106, when the circular REP genome begins to spread.

Figure 2. The spread of a circular genome containing only the REP gene. Legends: cir_rep – circular RNA containing the REP sequence; cir_repcom – circular RNA containing the complementary sequence of the REP; lin_rep – linear RNA containing the REP sequence; lin_repcom – linear RNA containing the complementary sequence of the REP; cir_ct – circular RNA containing the control sequence; cir_ctcom – circular RNA containing the complementary sequence of the control. (A) At step 1 × 104, 50 linear RNA molecules with the REP sequence and the same number of linear RNA molecules with the control sequence are inoculated into the system (at locations which are randomly chosen, the same below). (B) One linear RNA molecule with the REP sequence and one linear RNA molecule with the control sequence are inoculated into the system every 1 × 104 steps, until step 1.23 × 106, when the circular REP genome begins to spread.

2.3. The roles of the circular genome and the ribozyme

Firstly, we asked: Could the circular RNA really play the role of genome in the spread of REP? So we turned off the templating capability of the linear RNA with the REP sequence or its complement (FLT = 0) midway (, at step 2.5 × 106). The outcome is the decline of REP – both the circular form and the linear form. In other words, after the turning off, the linear RNA with the REP sequence, which may act as a replicase ribozyme, can only be produced from the breaking of the circular RNA with the REP sequence, or via the copying on the template of the circular RNA with the complement of the REP sequence. Obviously, the spread can sustain, albeit at a lower level. Indeed, the low level of the spread (especially, the number of linear REP is rather low) is understandable considering the inefficient way of a circular genome to produce the corresponding ribozyme (). Simply put, though initially it may have been both the circular form and the linear form that played the role of genome, the circular form can actually play the role of genome alone.

Then, we wanted to confirm: When the circular form plays the role of genome alone, is the spread of REP really attributed to the function of the replicase ribozyme? Indeed, the answer is positive – when we turned off the function of the replicase ribozyme (PTLR = 0 after step 5 × 106, ), we saw the complete collapse of the spread.

Figure 3. The analysis on the roles of the circular genome and the ribozyme (in regard to REP). The legends are interpreted in the same way as those in . The case is just the one shown in , but excludes symbols related to the control sequence, which consistently hover near zero throughout the simulation. This omission prevents their interference with showcasing the linear REP and its complement, both of which exhibit very low levels after the adjustment of parameters (red arrows). At step 2.5 × 106, FLT, which is related to the ability of a linear RNA acting as a template, is altered from 0.5 (default value, see ) to 0 for the linear RNAs containing the REP sequence or its complementary sequence. Then, at step 5 × 106, PTLR, which reflects the catalytic ability of REP, is modified from 0.9 (default value) to 0. See text for a detailed explanation of the analysis.

Figure 3. The analysis on the roles of the circular genome and the ribozyme (in regard to REP). The legends are interpreted in the same way as those in Figure 2. The case is just the one shown in Figure 2A, but excludes symbols related to the control sequence, which consistently hover near zero throughout the simulation. This omission prevents their interference with showcasing the linear REP and its complement, both of which exhibit very low levels after the adjustment of parameters (red arrows). At step 2.5 × 106, FLT, which is related to the ability of a linear RNA acting as a template, is altered from 0.5 (default value, see Table 1) to 0 for the linear RNAs containing the REP sequence or its complementary sequence. Then, at step 5 × 106, PTLR, which reflects the catalytic ability of REP, is modified from 0.9 (default value) to 0. See text for a detailed explanation of the analysis.

Indeed, the modelling suggests that initially both the linear form and the circular form could have played the role of genome (refer to the drop of the spread when turning off the templating capability of the linear form at step 2.5 × 106 in ), wherein the linear form may have been even more important considering the restriction for the circular form to derive the ribozyme (). But in the long run, it should be the circular form that become more and more important because when the ribozyme evolved towards more efficient – thus with a more compact and complicated structure, the linear form would be more and more unsuitable to act as template. Then, we asked: Can our modelling provide some messages on this tendency? The factor representing the suitability of a linear form RNA to act as template, relative to a circular RNA, is here assumed to be 0.5 as a default value (FLT = 0.5, refer to ). When the factor is turned down for the linear RNA containing the REP sequence midway, the spread of REP declines only limitedly (). It is worth mentioning that different from the case shown in , here this factor is not adjusted for the linear RNA containing the REP’s complement because it seems unreasonable to assume that the complementary sequence of the ribozyme would also adopt a compact and complicated structure. Notably, even FLT is finally turn off (FLT = 0 after step 6 × 106), which means that the linear REP completely loses the ability to act as template, the spread is not apparently impacted. As a contrast, if the circular form is assumed to be unable to emerge (PEL = 0), the turning down of FLT for the linear form with a REP sequence may be fatal (), because the passing down of the genetic information would be problematic. That is to say, the labour division between the circular form (as the genome) and the linear form (as the ribozyme) should have been more and more clear with the ribozyme’s evolution.

Figure 4. The influence of reducing template capability of the linear REP. The legends are interpreted in the same way as those in . The symbols concerning the control sequence, which are nearly at the zero level throughout the simulation, are omitted. (A) The case is just the one shown in , except that FLT , which represents the template suitability of a linear RNA, is adjusted for the REP ribozyme midway (from the default value 0.5 to 0.2, to 0.1 and finally to 0; see red arrows), the outcome suggests that when the circular RNA can act as a genome, the evolution of the ribozyme towards more complicated folding, (thus with a reduced template capability) would have little impact on its spread as a species. (B) The case is run under the same situations as the case shown in (a), except that PEL, the probability for RNA’s intramolecular end-to-end ligation, is set to 0 and thus no circular RNA can appear in the system. The outcome suggests that when the linear REP ribozyme has to act as a template itself, its evolution towards greater structural complexity may be greatly hindered by the simultaneous loss of its template capability.

Figure 4. The influence of reducing template capability of the linear REP. The legends are interpreted in the same way as those in Figure 2. The symbols concerning the control sequence, which are nearly at the zero level throughout the simulation, are omitted. (A) The case is just the one shown in Figure 2A, except that FLT , which represents the template suitability of a linear RNA, is adjusted for the REP ribozyme midway (from the default value 0.5 to 0.2, to 0.1 and finally to 0; see red arrows), the outcome suggests that when the circular RNA can act as a genome, the evolution of the ribozyme towards more complicated folding, (thus with a reduced template capability) would have little impact on its spread as a species. (B) The case is run under the same situations as the case shown in (a), except that PEL, the probability for RNA’s intramolecular end-to-end ligation, is set to 0 and thus no circular RNA can appear in the system. The outcome suggests that when the linear REP ribozyme has to act as a template itself, its evolution towards greater structural complexity may be greatly hindered by the simultaneous loss of its template capability.

2.4. The introduction of a noncoding sequence

Above we have shown that a one-gene circular genome could spread by virtue of the function of the ribozyme it encodes (), though the modes for the genome to generate the ribozyme are featured with a strong restriction (). With the restriction, the number of the ribozyme is low and the spread of the genome is limited. Interestingly, we found that this situation may be improved in a simple way. That is, in many cases, a noncoding sequence is introduced into the circular genome, and the restriction is partially released (i.e. the requirement for ‘accurate breaking’ or ‘accurate synthesis’ is no longer so strict, refer to for a schematic).

shows a typical case of such transformation, in which a 12nt genome comprising an 8nt REP sequence plus a 4nt noncoding sequence takes over the compact 8nt REP genome to spread in the system soon after the beginning (see for the spatial distribution snapshots of this case, in which the size of the circular genomes is straightforwardly represented by the size of circles in the pictures). Interestingly, we found that just for the case shown in , when we run the simulation with a larger time scale, a transition from the original 8nt circular genome to a 10nt one occurs (). In this case, it is clear that when a 2nt noncoding sequence is introduced into the circular genome and the restriction is somewhat relieved, the number of the circular genome, as well as the number of the linear ribozymes, reaches a higher level. Additionally, we found that for the case shown in , wherein one REP molecule is inoculated intermittently, the ultimately spreading circular genome is actually 9nt (). That should be the major reason why the level of its spread () is significantly higher than the case shown in , in which the spreading circular genome is only 8nt in length at that stage. Somewhat strikingly, we even recorded a case in which the circular genome evolves twice – from 8nt to 9nt and then to 12nt ().

Figure 5. The emergence of circular genomes with a noncoding sequence. The upper panel of a subfigure here is in the same form as a subfigure in , except that the complementary sequences of the REP and the control, which are roughly equal in number to their ‘sense chain’ throughout the simulation, are omitted. The lower panel shows the evolution of circular REP genomes with different lengths within the case – corresponding to the legends, for example, ‘8nt’ represents a circular genome only containing the REP sequence (note the default REP sequence is 8nt in length; see ); ‘12nt’ means a circular genome containing a REP sequence plus a 4nt noncoding sequence. (A) The case is in the same situation (i.e. parameter setting) as the case shown in , except with a different random seed. See for some key snapshots concerning spatial distribution of this case. (B) The case is just the one shown in , but with a longer period of simulation time (more Monte Carlo steps). (C) The case is just the one shown in . (D) The case is also in the same situation of the case shown in , except with a different random seed.

Figure 5. The emergence of circular genomes with a noncoding sequence. The upper panel of a subfigure here is in the same form as a subfigure in Figure 2, except that the complementary sequences of the REP and the control, which are roughly equal in number to their ‘sense chain’ throughout the simulation, are omitted. The lower panel shows the evolution of circular REP genomes with different lengths within the case – corresponding to the legends, for example, ‘8nt’ represents a circular genome only containing the REP sequence (note the default REP sequence is 8nt in length; see Table 1); ‘12nt’ means a circular genome containing a REP sequence plus a 4nt noncoding sequence. (A) The case is in the same situation (i.e. parameter setting) as the case shown in Figure 2A, except with a different random seed. See Figure 6 for some key snapshots concerning spatial distribution of this case. (B) The case is just the one shown in Figure 2A, but with a longer period of simulation time (more Monte Carlo steps). (C) The case is just the one shown in Figure 2B. (D) The case is also in the same situation of the case shown in Figure 2A, except with a different random seed.

Figure 6. Snapshots showing the emergence of a circular one-gene genome and the subsequent introduction of a noncoding sequence. Raw materials (nucleotide precursors) are shown as yellow background, with colour depth representing their quantity in the corresponding grid room. At step 10,000, 50 linear RNA molecules with the sequence of REP (horizontal blue bars) and 50 linear control RNA molecules (vertical green bars) are inoculated into system (at locations chosen randomly). The snapshot at step 30,000 shows the spread of linear REP molecules (at two different regions). The snapshot at step 80,000 indicates the emergence of the 8nt circular REP genome (small blue circles at the top-left corner, also spreading to the bottom left corner because of the grid’s toroidal topology). The snapshot at step 130,000 indicates the emergence of the 12nt circular REP genome comprising a 4nt noncoding sequence (blue circles at the central part, larger than the 8nt circles). The snapshot at step 220,000 shows the spread of the 12nt genome and the simultaneous decline of the 8nt genome. The snapshot at step 900,000 shows the complete domination of the system by the 12nt genome. See for the evolutionary dynamics of this case.

Figure 6. Snapshots showing the emergence of a circular one-gene genome and the subsequent introduction of a noncoding sequence. Raw materials (nucleotide precursors) are shown as yellow background, with colour depth representing their quantity in the corresponding grid room. At step 10,000, 50 linear RNA molecules with the sequence of REP (horizontal blue bars) and 50 linear control RNA molecules (vertical green bars) are inoculated into system (at locations chosen randomly). The snapshot at step 30,000 shows the spread of linear REP molecules (at two different regions). The snapshot at step 80,000 indicates the emergence of the 8nt circular REP genome (small blue circles at the top-left corner, also spreading to the bottom left corner because of the grid’s toroidal topology). The snapshot at step 130,000 indicates the emergence of the 12nt circular REP genome comprising a 4nt noncoding sequence (blue circles at the central part, larger than the 8nt circles). The snapshot at step 220,000 shows the spread of the 12nt genome and the simultaneous decline of the 8nt genome. The snapshot at step 900,000 shows the complete domination of the system by the 12nt genome. See Figure 5A for the evolutionary dynamics of this case.

No matter how, the introduction of a noncoding sequence, though being apparently disadvantageous for the completion of the genome’s replication, seems to be an acceptable solution to ease the restriction for a one-gene circular genome to generate its corresponding ribozyme (that is, in a context of ‘random breaking’ and ‘random synthesis’, more of the products derived from the circular genome would be the ribozyme). Then, we reasoned that this feature may have favoured the emergence of a second gene into the genome – i.e. via the mutation of the noncoding sequence.

2.5. The spread of a two-gene circular genome

In fact, for the ribozymes that may have emerged early in the RNA world, there is another candidate: a ribozyme catalysing the synthesis of nucleotides, i.e. a nucleotide-synthetase ribozyme (‘NR’). This ribozyme can favour its own replication by supplying the building blocks and thus may (in principle of Darwinian evolution) as well have spread in the prebiotic pool – the plausibility of which has been supported by computer modelling [Citation27]. By experimental work, such a synthesis reaction catalysed by RNA has been shown to be feasible [Citation46–48]. Actually, when we assumed that the ribozyme emerging first in the naked phase is an NR, we also saw the spread of the corresponding one-gene circular genome – interestingly, with the introduction of a noncoding sequence as well (Fig. S1). No matter how, here what we are curious about is: if we assume the NR as the second ribozyme emerging in the scene, can the two-gene circular genome spread in the system?

Similar to the situation concerning the study of the one-gene genome, firstly, we wanted to know whether a circular RNA comprising both the sequences of REP and NR could spread in the model system. The way of producing the ribozymes from the circular genome is shown in . It may be expected that the two ribozymes, which favour the replication of the circular genome from different aspects (one for the template-directed copying and the other for the synthesis of the building blocks), may aid the two-gene genome to spread in the system. Our simulation turns out to support such a scene. After introducing nucleotide precursors initially, at step 1 × 104, a hundred circular RNA molecules with the sequences of REP and NR (16nt in length, 8nt for each gene), together with the same number of control RNA molecules (also 16nt in length), are inoculated into system. Then, the circular RNA with both the two genes arises and spreads in the system, whereas the control species cannot (see for a typical case). To be clearer, in the figure, the number of the complement chains is not demonstrated, which is actually also roughly equal to the ‘sense’ chains. Additionally, the number of linear REP or NR shown here are the ones shorter than 12nt, as corresponding ribozymes (i.e. lin-rep-rib and lin-nr-rib) – note that a linear RNA comprising the sequence of a ribozyme but with a length of or greater than 1.5 times of this sequence (1.5 × 8nt = 12nt) is in our model assumed to be unable to fold into an appropriate functional structure (see Methods). In fact, the breaking of a two-gene circular genome or the dissociation of a synthesized chain from the genome (refer to ) might generate a linear RNA not short enough to be able to fold correctly as a ribozyme, but the short length may be ‘reached’ with the degradation of the RNA (e.g. via end-decay of nucleotide residues).

Then, we turned to the issue of whether the second gene could emerge through the mutation of the noncoding sequence within a one-gene circular genome. We considered a one-gene circular genome with a REP sequence plus a noncoding sequence. To facilitate the emergence of the second gene, we assumed an NR sequence that differs from the noncoding sequence by only one residue and adopted a relatively higher error rate of template-directed copying (PFP is adjusted to 0.01 from its default value 0.001, refer to ). After initially introducing nucleotide precursors, at step 1 × 104, a hundred circular RNA molecules comprising the REP sequence and the noncoding sequence (16nt in length, 8nt for each), together with the same number of control RNA molecules (also 16nt in length), are inoculated into system. Then, the one-gene (REP) genome spreads in the system, whereas the control does not. Next, interestingly, we observed the arising of the two-gene (REP-NR) genome alongside the decline of the original one-gene genome (, see for the spatial distribution snapshots). It is noteworthy that in this case, there is a low-level spread of one-gene genome, both for REP and NR (depicted as blue and red circles respectively), which should be largely attributed to mutations in the two-gene genome due to the relatively high error rate assumed for replication.

Figure 7. The spread of a two-gene circular genome. Legends: cir_repnr – circular RNA containing both the REP and the NR sequences; cir_rep – circular RNA containing the REP (but not the NR) sequence; cir_nr – circular RNA containing the NR (but not the REP) sequence; lin_rep_rib – linear RNA containing the REP sequence and with a chain length shorter than 1.5 times of the ribozyme’s characteristic sequence (thus able to act as the corresponding ribozyme, see text for details); lin_nr_rib – linear RNA containing the nr sequence and with a chain length shorter than 1.5 times of the ribozyme’s characteristic sequence (thus able to act as the corresponding ribozyme); cir_ctct1 or cir_ct – circular RNA containing a control sequence (see below for details). The complementary sequences of the circular REP-NR and the control, which are roughly equal in number to their ‘sense chain’ throughout the simulation, are omitted. (A) At step 1 × 104, a hundred circular RNA molecules with the sequences of REP and NR (16nt in length, 8nt for one gene), together with the same number of control RNA molecules (also 16nt in length, ‘cir_ctct1’), are inoculated into system. The 16nt control RNA comprises the default 8nt control sequence (refer to ) plus another 8nt sequence (‘AACGCUCG’). (B) At step 1 × 104, a hundred circular RNA molecules with the REP sequence and a noncoding sequence (16nt in length, 8nt for each), together with the same number of control RNA molecules (also 16nt in length), are inoculated into system. The noncoding sequence is ‘UGACGCAG’, which is assumed to be only one residue different from the NR sequence (‘UGAUGCAG’, refer to table 1). The 16nt control RNA comprises an 8nt control sequence ‘ACUGACGU’ plus the noncoding sequence assumed here. The legend ‘cir_ct’ means circular RNA containing the 8nt control. TNPB = 2 × 105, PBB = 2 × 10−5, PFP = 0.01, PNDE = 0.002, and PNF = 0.001. See for some key snapshots concerning spatial distribution of this case.

Figure 7. The spread of a two-gene circular genome. Legends: cir_repnr – circular RNA containing both the REP and the NR sequences; cir_rep – circular RNA containing the REP (but not the NR) sequence; cir_nr – circular RNA containing the NR (but not the REP) sequence; lin_rep_rib – linear RNA containing the REP sequence and with a chain length shorter than 1.5 times of the ribozyme’s characteristic sequence (thus able to act as the corresponding ribozyme, see text for details); lin_nr_rib – linear RNA containing the nr sequence and with a chain length shorter than 1.5 times of the ribozyme’s characteristic sequence (thus able to act as the corresponding ribozyme); cir_ctct1 or cir_ct – circular RNA containing a control sequence (see below for details). The complementary sequences of the circular REP-NR and the control, which are roughly equal in number to their ‘sense chain’ throughout the simulation, are omitted. (A) At step 1 × 104, a hundred circular RNA molecules with the sequences of REP and NR (16nt in length, 8nt for one gene), together with the same number of control RNA molecules (also 16nt in length, ‘cir_ctct1’), are inoculated into system. The 16nt control RNA comprises the default 8nt control sequence (refer to Table 1) plus another 8nt sequence (‘AACGCUCG’). (B) At step 1 × 104, a hundred circular RNA molecules with the REP sequence and a noncoding sequence (16nt in length, 8nt for each), together with the same number of control RNA molecules (also 16nt in length), are inoculated into system. The noncoding sequence is ‘UGACGCAG’, which is assumed to be only one residue different from the NR sequence (‘UGAUGCAG’, refer to table 1). The 16nt control RNA comprises an 8nt control sequence ‘ACUGACGU’ plus the noncoding sequence assumed here. The legend ‘cir_ct’ means circular RNA containing the 8nt control. TNPB = 2 × 105, PBB = 2 × 10−5, PFP = 0.01, PNDE = 0.002, and PNF = 0.001. See Figure 8 for some key snapshots concerning spatial distribution of this case.

Figure 8. Snapshots showing the deriving of a two-gene circular genome from a one-gene circular genome with a noncoding sequence. Raw materials (nucleotide precursors) are shown as yellow background, with colour depth representing their quantity in the corresponding grid room. At step 10,000, a hundred circular RNA molecules with the REP sequence and a noncoding sequence (blue circles), together with the same number of control RNA molecules (green circles), are inoculated into the system (at locations chosen randomly). The snapshot at step 150,000 shows the spread of the circular genome containing REP. The snapshot at step 530,000 indicates the emergence of the REP-NR circular genome (black circles in the bottom-left region). The snapshot at step 880,000 shows the spread of the REP-NR circular genome. The snapshot at step 1,100,000 shows further spread of the REP-NR circular genome and the simultaneous decline of the genome containing only REP (blue circles). Note that red circles denote the circular genome containing only NR. The snapshot at step 3,000,000 indicates the ‘absolute domination’ of the REP-NR genome in the system. See for the evolutionary dynamics of this case.

Figure 8. Snapshots showing the deriving of a two-gene circular genome from a one-gene circular genome with a noncoding sequence. Raw materials (nucleotide precursors) are shown as yellow background, with colour depth representing their quantity in the corresponding grid room. At step 10,000, a hundred circular RNA molecules with the REP sequence and a noncoding sequence (blue circles), together with the same number of control RNA molecules (green circles), are inoculated into the system (at locations chosen randomly). The snapshot at step 150,000 shows the spread of the circular genome containing REP. The snapshot at step 530,000 indicates the emergence of the REP-NR circular genome (black circles in the bottom-left region). The snapshot at step 880,000 shows the spread of the REP-NR circular genome. The snapshot at step 1,100,000 shows further spread of the REP-NR circular genome and the simultaneous decline of the genome containing only REP (blue circles). Note that red circles denote the circular genome containing only NR. The snapshot at step 3,000,000 indicates the ‘absolute domination’ of the REP-NR genome in the system. See Figure 7B for the evolutionary dynamics of this case.

Figure 9. The events occurring in the model and their associated parameters. Legends: Np – nucleotide precursor; Nt – nucleotide; REP – RNA replicase ribozyme; NR – nucleotide-synthetase ribozyme. Solid arrows denote chemical reactions and dashed arrows represent other events. The events concerning the template-directed synthesis are here drawn with respect to a template segment (within purple square brackets), which may belong to either a circular RNA or a linear RNA. Note that the region depicted here represents one grid room in the N × N grid of the model system. See text for detailed explanations.

Figure 9. The events occurring in the model and their associated parameters. Legends: Np – nucleotide precursor; Nt – nucleotide; REP – RNA replicase ribozyme; NR – nucleotide-synthetase ribozyme. Solid arrows denote chemical reactions and dashed arrows represent other events. The events concerning the template-directed synthesis are here drawn with respect to a template segment (within purple square brackets), which may belong to either a circular RNA or a linear RNA. Note that the region depicted here represents one grid room in the N × N grid of the model system. See text for detailed explanations.

Alternatively, we envisioned a scenario akin to the origin of paralogous genes in modern genomes, which involves gene-duplication followed by the emergence of new genes through mutation (because additional copies of the same gene may be ‘of little use’). In the context of our current modelling, for the one-gene REP genome, duplication could occur, for instance, through accidental ligation of two linear REP molecules within the system and subsequent self-circularization. The generation of REP ribozyme from the REP-REP circular genome should be easy, resembling a two-gene circular genome (refer to ). Noticeably, the REP-REP genome is even more favoured than the REP-noncoding (REP plus a noncoding sequence) genome because the breaking or synthesis at either side of the duplicated REP regions could lead to a REP ribozyme. Indeed, after inoculating a hundred circular RNA molecules with the REP-REP sequence (at step 1 × 104, alongside control molecules), the genome with the duplicated gene spreads throughout the system. Then, a period of time later, we saw the arising of the REP-NR genome concurrent with the decline of the REP-REP genome (Fig. S2, see Fig. S3 for spatial distribution snapshots). Notably, to facilitate the mutation-driven appearance of NR, here, NR is assumed to differ from REP by only one residue, and as with the previous case (), a relatively higher error rate concerning replication is adopted (PFP = 0.01).

3. Discussion

In the present study, we conducted computer simulation on the initial stage of the RNA world (the precellular phase), examining the plausibility of a circular genome. Our findings suggest that a one-gene circular genome could have spread () even under the stringent constraint of accurately producing the corresponding ribozyme ( for a schematic). This restriction could have been alleviated by introducing a noncoding segment into the genome (; for a schematic). A second gene might have derived from the noncoding sequence, leading to the appearance of a two-gene circular genome, which could have become thriving by virtue of the two ribozymes it encodes (; for a schematic).

Here, we propose that the requirement of labour division between the genome and functional molecules is the key ‘appeal’ to involve a circular genome, which does not tend to fold structurally. In particular, the division is advantageous in the long run when ribozymes evolved towards higher efficiency – associated with more complicated structures and thus with less templating capability (). In fact, such a requirement of labour division has attracted quite a few modelling studies in the area. For example, it was shown that in RNA-based protocells, the ‘trade-off’ between the catalytic activity and the template capability of a ribozyme strand (actually related to the folding degree) may lead to the labour division between the ribozyme strand and its complementary strand, i.e. the former as a functional molecule and the latter as a gene [Citation49]. Such a labour division would result in a quantitative asymmetry of the two strands: the ribozyme strand is more than the gene strand in number. Then, in a later study, it was highlighted that the ribozyme strand may have evolved towards higher catalytic efficiency ‘in order to’ improve the fitness of the protocells containing it, whereas the complementary strand of the ribozyme may have evolved towards higher template capability ‘in order to’ increase the fitness of these molecules (the two strands as one RNA species in the sense of genetic information) inside the protocell [Citation50]. Notably, both the two studies considered the labour division in a context of the cellular phase, whereas here we considered the division that may have already occurred in the naked phase. In addition, both of them thought that the labour division may have been implemented between the two complementary strands of RNA, which is different from our notion here concerning the distinct roles of circular and linear RNA. Note that the trade-off of the ribozyme strand is released with the involvement of the circular genome, and the two complementary strands of the circular genome turn out to be roughly equal in number (as mentioned in Results) – they are actually equivalent in the sense of carrying genetic information. A third example of related modelling work suggested that the labour division may have been implemented with the advent of DNA in the RNA world [Citation51], wherein RNA act as functional molecules and DNA carried the corresponding genes – thereby releasing the trade-off of the ribozyme strand. No matter how, all these studies suggested that the advantage of such a labour division, as a ‘driving force’, may have had important impacts on the evolutionary process in the early stage of life.

Remarkably, a labour division between the genome and ribozymes should have been not only advantageous in the long run but also crucial for resisting parasites (RNA molecules without any function) just in the beginning scene. As explained [Citation51], due to the trade-off for a ribozyme strand to act as a catalyst and a template, the ribozyme, as an RNA species, may have found itself at a disadvantage in competition with parasites, which could act exclusively as templates. The labour division could have released the trade-off and ‘saved’ the ribozyme species. For instance, as supposed in the current context, it is circular RNAs that may have taken the role of heredity and the competition should have no longer been unfair between the ribozyme species and the parasites.

Another advantage of involving a circular genome is to resist end-degradation of RNA molecules. In modern prokaryotes, circularity of the DNA chromosome is believed to be a strategy for resisting end-degradation, which is caused mainly by exonuclease cleavage of terminal phosphodiester bonds. RNA degradation in modern cells has been studied in detail [Citation52], in which it was shown that exonuclease activities are apparently more prevalent than endonuclease activities. This is understandable since the terminal bonds should be more exposed in solution. However, as far as we know, there is as yet no direct evidence for the assertion that RNA’s chemical degradation (in the absence of enzymes) is also more readily at the terminal bonds. Another form of RNA end-degradation may have stemmed from the spontaneous decay of nucleotide residues at chain ends. Indeed, residues within an RNA chain, which is less exposed to the solution, may have been difficult to decay. However, terminal residues should have been subjected to decay in a non-negligible way, albeit likely to a lesser extent in comparison with the decay of free nucleotides. In practice, when considering RNA’s end-degradation, here we only assumed the spontaneous decay of end-residues, but not the potentially easier breaking of the terminal phosphodiester bonds in RNA chains. This is a conservative consideration. If both forms of end-degradation were present, the benefit of the circularization in preventing end-degradation should have been more apparent.

The involvement of a circular RNA genome in the RNA world makes the rolling-circle mode of replication feasible, which might have arisen subsequently. Indeed, so far, an intractable issue concerning the RNA world scenario seems to involve the dissociation of the complementary strand from the template to enable another round of template-directed replication. In the absence of the aid of a helicase, the separation of RNA’s double strands appears to be quite difficult, especially if the RNA chain is long. The mode of rolling-circle, which is found in viroids [Citation22,Citation53] and viroid-like satellite RNAs [Citation54,Citation55], has been proposed to avoid this difficulty via continuing strand-displacement. Wherein, in particular, viroids have been suggested to be a ‘relic’ of the ancient RNA world [Citation56]. In fact, both experimental and theoretical work has started to explore the plausibility of such a replication mode in the RNA world [Citation57–59]. Using a model similar to the one employed here to address this issue in the future is attractive and seems promising to provide insights into the Darwinian evolution involved.

It is understandable that limited dispersal is important for the ribozymes to spread in a naked scene – to ensure that the species could benefit from its own function (e.g. REP may favour the template-directed replication of RNA; NR may produce building blocks for RNA synthesis). This is evidenced in almost each modelling study concerning this precellular phase (e.g. Ref [Citation27–29,Citation45,Citation60,Citation61]). In the present context, the replication of the circular genome could benefit from the ribozyme(s) it encodes due to the dispersal limitation (refer to the snapshots in and Fig. S3; the associated parameter is PMN, see Methods for details). In fact, in a previous study of ours, we have modelled the circular genome with four genes in RNA-based protocells [Citation62]. The purpose of that study is to explore the plausibility of the involvement of gene-linkage to avoid ‘gene loss’ during the division of the protocells [Citation16,Citation63]. The protocells containing the four-gene circular genome (referred to as ‘chromosome’ therein) are shown to be able to spread by virtue of the four different functional ribozymes from distinct aspects. Self-cleavage is assumed in that model by the intergenic sequence with a hammerhead ribozyme-like function to produce these ribozymes (actually inspired by the rolling circle replication which involves the hammerhead ribozyme for producing single copies of the viroid genome through self-cleaving) [Citation22]. The naked scene modelled here can be deemed as a ‘prequel’ of that cellular scene. In this prequel, as an encouraging outcome, we saw that even without the self-cleaving function (i.e. just with ‘random breaking or synthesis’ as mentioned already), the one-gene and two-gene circular genomes can become thriving as well (thus conforming with the implication of simplicity on the origin problem). No matter how, it is expected that a membrane boundary, as a ‘stronger version’ of dispersal limitation, may have been necessary to ensure the cooperation of more ribozymes [Citation61,Citation64]. That is, the genomes with more genes are likely to have emerged after the ‘major transition’ in the evolution from the naked phase to the cellular phase [Citation18–20].

4. Methods

4.1. The events occurring in the model system

is a schematic of the events in each time step (i.e. Monte Carlo step), as well as the associated probabilities and factors (refer to for their descriptions). Only the molecules within the same grid room may interact with each other. A molecule may move to an adjacent room (related probability: PMN for nucleotides and RNA and PMNP for nucleotide precursors).

A nucleotide precursor may form a nucleotide (randomly as A, G, C, or U) in a non-enzymatic way (PNF) or catalysed by NR (PNFR). A nucleotide may also decay into their precursors (PND). Nucleotides and linear RNAs may conduct random intermolecular ligation (PRL) to form longer chains. A linear RNA may conduct intramolecular end-to-end ligation (PEL) and thus transform into a circular one. An RNA molecule may attract substrates (nucleotides or oligomers) (PAT) via base-pairing with some error rate (PFP). The attraction following an adjacent substrate that is already located on the template would be easier than the de novo attraction (FDA) (i.e. the ‘primer effect’). In addition, a linear RNA may be more difficult to act as a template to attract substrates than a circular RNA (FLT). The substrates aligned adjacently on the template may be ligated in a non-enzymatic way (PTL) or catalysed by REP (PTLR) – that is, the template-directed synthesis. The substrates or the full complementary chain may separate from the template (PSP). Phosphodiester bonds within an RNA chain may break (PBB) – thus a circular RNA may turn into a linear one and a linear RNA may split into fragments. A nucleotide residue at the end of an RNA chain may decay into a nucleotide precursor (PNDE). The REP and NR may catalyse their corresponding reaction for multiple times (TREP and TNR) in a time step.

Notably, similar to our previous modelling work concerning the scenes in the RNA world, the energy problem is here not considered explicitly. For example, nucleotides and oligonucleotides are implicitly assumed to be activated – in particular, when they form from the degradation of RNA, they are assumed to be activated again immediately to be able to be reused in the further synthesis of RNA. In fact, such in situ activation has been revealed to be possible by lab work [Citation65–67]. In reality, the energy source may have involved chemical energy in the hatchery of the primordial life, such as hydrothermal vents at the sea bottom [Citation68–70] or hydrothermal fields on land [Citation71,Citation72], as supposed. Since the substrates are here assumed to be always ‘activated’, the RNA species in the model system are competing for materials but not energy – as mentioned already, the total materials in the system are limited (TNPB). Certainly, in reality, competitions for materials and energy are both possible in Darwinian evolution.

4.2. The setting of parameters

The probabilities and the factors concerning the events in the system should be set according some rules. Reactions catalysed by ribozymes should be much more efficient than corresponding non-enzymatic reactions, so PTLR >> PTL and PNFR >> PNF. Template-directed ligation should be apparently more efficient than intermolecular random ligation, which should be roughly in the same scale as the intramolecular end-to-end ligation of a linear RNA, so PTL >> PRLPEL. Here, nucleotide residues within the chain are assumed to be unable to decay – they should be protected therein, whereas those at the end of the chain, which is ‘semi-protected’, decay at a rate lower than that of free nucleotides, i.e. PNDE < PND. The template-directed synthesis without the aid of a proteinase could not have a very high fidelity, so PFP could not be set a very small value. A linear RNA should have a less templating-efficiency than a circular RNA, so FLT < 1; the de novo attraction of a substrate onto a template may be more difficult than the attraction following a primer, so FDA > 1 (see below for a detailed explanation about the ways these factors work). Other considerations may include: PBB may be higher than PRL but lower than PNDE; PMN < PMPN; PNF < PND, etc.

In consideration of the computational intensity, total materials in the system (TNPB) are assumed obviously smaller in scale than the potential corresponding situation in reality; similarly, the characteristic RNA sequence for REP or NR assumed here (8nt) is likely to be much shorter than the corresponding ribozyme in reality. However, such simplifications are believed to be not in conflict with the fundamental mechanisms that may be reflected by the modelling.

Here, a large portion of parameter values were actually set based on our experience in previous modelling studies concerning the naked phase of the RNA world [Citation26–29]. But notably, in principle, a machine learning-like approach may be useful to automatically explore the parameter values supporting such supposed scenes in the origin of life [Citation73]. The default values listed in were adopted to shape the cases for demonstrating our results. Though the outcome of a simulation may be influenced by the change of some ‘key parameters’ (e.g. refer to ), it turned out to be fairly robust against ‘moderate adjustments’ of most parameters.

4.3. Some detailed assumptions in consideration of relevant mechanisms

A linear RNA containing the characteristic sequence of a ribozyme can act as the ribozyme only when it is shorter than 1.5 times of the characteristic sequence, considering that too many redundant residues may seriously interfere with the folding of the catalytic domain.

An RNA template may attract a substrate (nucleotide or oligomer) at any ‘empty’ site, provided that the substrate is no longer than the empty site. The attraction with a foregoing substrate (as a ‘primer’) on the template is easier than the de novo attraction by a factor of FDA (FDA >1). A linear template would have a lower templating efficiency than a circular template by a factor of FLT (FLT <1). That is, for the attraction following a primer on a circular template, the corresponding probability is PAT; for the de novo attraction on a circular template, the probability is actually PAT/FDA; for the attraction following a primer on a linear template, actually PAT × FLT; for the de novo attraction on a linear template, actually PAT × FLT/FDA.

Notably, the setting of FLT < 1 is here important for understanding the labour division between circular genomes and corresponding ribozymes. Indeed, in principle, circular RNAs may also adopt tightly folding structures (like the ones in viroids) [Citation22]. However, circular RNA genomes in the beginning should have derived from the circularization of ribozymes (for instance, in the cases shown in , the one-gene circular genome arises from the circularization of REP). While a ribozyme signifies a functional, folding RNA molecule, a circular RNA with the same sequence would be topologically hindered in folding, leading to a more relaxed structure. Especially, the RNA species in the early RNA world should have been relatively short; without free ends, a short circular RNA may ‘find itself difficult to fold’ by forming internal base pairs (even if feasible, likely in a quite limited mode). Then, in a two-gene circular genome, the two ‘ribozyme regions’ would almost certainly interfere with each other in respect of base-pairing – thus also could not folding tightly.

The probability of the separation of the two strands of a duplex RNA is actually assumed to be PSPr, where r = n 1/2, and n is the number of base pairs in the duplex. When n = 1, the probability would be PSP. With the increase of n, the probability would decrease (because PSP, as a probability, has a value between 0 and 1). That is, the separation of the two strands would be more difficult if the base pairs are more. The introduction of the 1/2 corresponds to the consideration that self-folding of single chains may aid the dissociation of the duplex.

With the breaking of phosphodiester bonds, an RNA molecule may degrade into shorter ones (including nucleotides). When the breaking site of the chain is at a single-chain region, the breaking rate is PBB. When the breaking site is within a double-chain region, the two parallel bonds may break simultaneously, with the probability PBB3/2. The adoption of the index 3/2, instead of 2, corresponds to the consideration of the synergistic effect of the two breaking events.

As mentioned already, nucleotide residues at the end of an RNA chain (either at the 3'- or 5'-end) may undergo decay. With a similar consideration to RNA’s chain-breaking above, when the chain end is in a single-chain state, the decay rate is PNDE; when the chain end is in a double-strand state, the two paired nucleotide residues may decay simultaneously, with a probability of PNDE3/2, also taking into account the synergistic effect.

The probability of the movement of an RNA molecule is assumed to be PMN/m1/2, where m is the mass of the RNA, relative to a nucleotide. This assumption represents the consideration of the effect of the molecular size on the molecular movement. The square root was adopted here according to the Zimm model, concerning the diffusion coefficient of polymer molecules in solution [Citation74].

(Note: Source codes of the simulation program in C language can be obtained from Github – see Data Availability Statement. Besides the role of evidencing the reproducibility of the present study, the source codes present more details about the implementation of the model and may help readers to understand the simulation better)

Authors’ contributions

Conceptualization, W.M.; methodology, W.M. and C.Y.; investigation, Y.L. and M.L.; writing – original draft preparation, W.M.; writing – review and editing, M.L. and W.M.; funding acquisition, W.M. All authors have read and agreed to the published version of the manuscript.

Supplemental material

Supporting_Information.pdf

Download PDF (683.2 KB)

Supplementary Material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/15476286.2024.2380130

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

All relevant data are within the paper and its Supporting Information files. Source codes of the simulation program can be obtained from: https://github.com/mwt2001gh/Circular-genome-at-the-very-beginning/blob/main/Fig2a-Crep-2.cpp (corresponding to the case shown in ) and https://github.com/mwt2001gh/Circular-genome-at-the-very-beginning/blob/main/Fig7b-Crepnr.cpp (corresponding to the case shown in ).

Additional information

Funding

This study was supported by the National Natural Science Foundation of China [No. 31571367] (http://www.nsfc.gov.cn) and Natural Science Foundation of Hubei Province (CN) [No. 2019CFB685] (http://kjt.hubei.gov.cn) to W.M. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • Gilbert W. Origin of life: the RNA world. Nature. 1986;319(6055):618. doi: 10.1038/319618a0
  • Joyce GF. The antiquity of RNA-based evolution. Nature. 2002;418(6894):214–221. doi: 10.1038/418214a
  • Bernhardt HS. The RNA world hypothesis: the worst theory of the early evolution of life (except for all the others). Biol Direct. 2012;7(1):23. doi: 10.1186/1745-6150-7-23
  • Higgs PG, Lehman N. The RNA world: molecular cooperation at the origins of life. Nat Rev Genet. 2015;16(1):7–17. doi: 10.1038/nrg3841
  • Luisi PL. About various definitions of life. Orig Life Evol Biosph. 1998;28(4/6):613–622. doi: 10.1023/A:1006517315105
  • Benner SA. Defining life. Astrobiology. 2010;10(10):1021–1030. doi: 10.1089/ast.2010.0524
  • Ma WT. The essence of life. Biol Direct. 2016;11(1):49. doi: 10.1186/s13062-016-0150-5
  • Ma WT. What does “the RNA World” mean to “the origin of life”? Life. 2017;7(4):49. doi: 10.3390/life7040049
  • Joyce GF, Orgel LE. Progress toward understanding the origin of the RNA world. In: Gesteland R, Cech T Atkins J, editors. The RNA world. (NY): CSHL Press; 2006. p. 23–56.
  • Kun Á, Szilágyi A, Könnyü B, et al. The dynamics of the RNA world: insights and challenges. Ann N Y Acad Sci. 2015;1341(1):75–95. doi: 10.1111/nyas.12700
  • Pressman A, Blanco C, Chen IA. The RNA world as a model system to study the origin of life. Curr Biol. 2015;25(19):R953–63. doi: 10.1016/j.cub.2015.06.016
  • Ma WT, Liang YZ. Investigating prebiotic protocells for an understanding of the origin of life: a comprehensive perspective combining the chemical, evolutionary and historical aspects. In: Fiore M, editor. Prebiotic chemistry and Life’s origin. Cambridge: Royal society of chemistry; 2022. p. 347–378.
  • Sutherland JD. The origin of life—out of the blue. Angew Chem Int Ed. 2016;55(1):104–121. doi: 10.1002/anie.201506585
  • Powner MW, Gerland B, Sutherland JD. Synthesis of activated pyrimidine ribonucleotides in prebiotically plausible conditions. Nature. 2009;459(7244):239–242. doi: 10.1038/nature08013
  • Powner MW, Sutherland JD, Szostak JW. The origin of nucleotides. Synlett. 2011;14(14):1956–1964. doi: 10.1055/s-0030-1261177
  • Szathmáry E. The origin of replicators and reproducers. Philos Trans R Soc B Biol Sci. 2006;361(1474):1761–1776. doi: 10.1098/rstb.2006.1912
  • Takeuchi N, Hogeweg P. Evolutionary dynamics of RNA-like replicator systems: a bioinformatic approach to the origin of life. Phys Life Rev. 2012;9(3):219–263. doi: 10.1016/j.plrev.2012.06.001
  • Szathmáry E, Maynard-Smith J. From replicators to reproducers: the first major transitions leading to life. J Theor Biol. 1997;187(4):555–571. doi: 10.1006/jtbi.1996.0389
  • Szathmáry E, Maynard-Smith J. The major evolutionary transitions. Nature. 1995;374(6519):227–232. doi: 10.1038/374227a0
  • Szathmáry E. Toward major evolutionary transitions theory 2.0. Proc Natl Acad Sci USA. 2015;112(33):10104–10111. doi: 10.1073/pnas.1421398112
  • Lasda E, Parker R. Circular RNAs: diversity of form and function. RNA. 2014;20(12):1829–1842. doi: 10.1261/rna.047126.114
  • Duran-Vila N, Elena SF, Daros JA, et al. Structure and evolution of viroids. In: Domingo E, Parish C Holland J, editors. Origin and evolution of viruses. Oxford: Elsevier; 2008. p. 43–64.
  • Winnik MA. End-to-end cyclization of polymer chains. Acc Chem Res. 1985;18(3):73–79. doi: 10.1021/ar00111a002
  • Müller S, Appel B. In vitro circularization of RNA. RNA Biol. 2017;14(8):1018–1027. doi: 10.1080/15476286.2016.1239009
  • Petkovic S, Müller S. RNA circularization strategies in vivo and in vitro. Nucleic Acids Res. 2015;43(4):2454–2465. doi: 10.1093/nar/gkv045
  • Ma WT, Yu CW, Zhang WT. Monte Carlo simulation of early molecular evolution in the RNA world. Biosystems. 2007;90(1):28–39. doi: 10.1016/j.biosystems.2006.06.005
  • Ma WT, Yu CW, Zhang WT, et al. Nucleotide synthetase ribozymes may have emerged first in the RNA world. RNA. 2007;13(11):2012–2019. doi: 10.1261/rna.658507
  • Ma WT, Yu CW, Zhang WT, et al. A simple template-dependent ligase ribozyme as the RNA replicase emerging first in the RNA world. Astrobiology. 2010;10(4):437–447. doi: 10.1089/ast.2009.0385
  • Chen Y, Ma WT, Morozov AV. The origin of biological homochirality along with the origin of life. PLOS Comp Biol. 2020;16(1):e1007592. doi: 10.1371/journal.pcbi.1007592
  • Ferris JP, Hill AR, Liu R, et al. Synthesis of long prebiotic oligomers on mineral surfaces. Nature. 1996;381(6577):59–61. doi: 10.1038/381059a0
  • Ferris JP. Montmorillonite catalysis of 30-50 mer oligonucleotides: laboratory demonstration of potential steps in the origin of the RNA world. Orig Life Evol Biosph. 2002;32(4):311–332. doi: 10.1023/A:1020543312109
  • Ertem G. Montmorillonite, oligonucleotides, RNA and origin of life. Orig Life Evol Biosph. 2004;34(6):549–570. doi: 10.1023/B:ORIG.0000043130.49790.a7
  • Franchi M, Gallori E. A surface-mediated origin of the RNA world: biogenic activities of clay-adsorbed RNA molecules. Gene. 2005;346:205–214. doi: 10.1016/j.gene.2004.11.002
  • Joyce GF, Orgel LE. Prospects for understanding the origin of the RNA world. In: Gesteland R, Cech T Atkins J, editors. The RNA world. (NY): CSHL Press; 1999. p. 49–77.
  • Szostak JW, Bartel DP, Luisi PL. Synthesizing life. Nature. 2001;409(6818):387–390. doi: 10.1038/35053176
  • Robertson MP, Joyce GF. The origins of the RNA world. Csh Perspect Biol. 2012;4(5):a003608. doi: 10.1101/cshperspect.a003608
  • Joyce GF, Szostak JW. Protocells and RNA self-replication. Csh Perspect Biol. 2018;10(9):a034801. doi: 10.1101/cshperspect.a034801
  • Bartel DP. Re-creating an RNA replicase. In: Gesteland R, Cech T Atkins J, editors. The RNA world. (NY): CSHL Press; 1999. p. 143–162.
  • Johnston WK, Unrau PJ, Lawrence MS, et al. RNA-catalyzed RNA polymerization: accurate and general RNA-templated primer extension. Science. 2001;292(5520):1319–1325. doi: 10.1126/science.1060786
  • Zaher HS, Unrau PJ. Selection of an improved RNA polymerase ribozyme with superior extension and fidelity. RNA. 2007;13(7):1017–1026. doi: 10.1261/rna.548807
  • Wochner A, Attwater J, Coulson A, et al. Ribozyme-catalyzed transcription of an active ribozyme. Science. 2011;332(6026):209–212. doi: 10.1126/science.1200752
  • Attwater J, Wochner A, Holliger P. In-ice evolution of RNA polymerase ribozyme activity. Nat Chem. 2013;5(12):1011–1018. doi: 10.1038/nchem.1781
  • Horning DP, Joyce GF. Amplification of RNA by an RNA polymerase ribozyme. Proc Natl Acad Sci USA. 2016;113(35):9786–9791. doi: 10.1073/pnas.1610103113
  • Cojocaru R, Unrau PJ. Processive RNA polymerization and promoter recognition in an RNA world. Science. 2021;371(6535):1225–1232. doi: 10.1126/science.abd9191
  • Szabó P, Scheuring I, Czárán T, et al. In silico simulations reveal that replicators with limited dispersal evolve towards higher efficiency and fidelity. Nature. 2002;420(6913):340–343. doi: 10.1038/nature01187
  • Unrau PJ, Bartel DP. RNA-catalyzed nucleotide synthesis. Nature. 1998;395(6699):260–263. doi: 10.1038/26193
  • Lau MWL, Cadieux KEC, Unrau PJ. Isolation of fast purine nucleotide synthase ribozymes. J Am Chem Soc. 2004;126(48):15686–15693. doi: 10.1021/ja045387a
  • Lau MWL, Unrau PJ. A promiscuous ribozyme promotes nucleotide synthesis in addition to ribose chemistry. Chem Biol. 2009;16(8):815–825. doi: 10.1016/j.chembiol.2009.07.005
  • Boza G, Szilágyi A, Kun K, et al. Evolution of the division of labor between genes and enzymes in the RNA world. PLOS Comp Biol. 2014;10(12):e1003936. doi: 10.1371/journal.pcbi.1003936
  • Takeuchi N, Hogeweg P, Kaneko K. The origin of a primordial genome through spontaneous symmetry breaking. Nat Commun. 2017;8(1):250. doi: 10.1038/s41467-017-00243-x
  • Takeuchi N, Hogeweg P, Koonin EV, et al. On the origin of DNA genomes: evolution of the division of labor between template and catalyst in model replicator systems. PLOS Comp Biol. 2011;7(3):e1002024. doi: 10.1371/journal.pcbi.1002024
  • Houseley J, Tollervey D. The many pathways of RNA degradation. Cell. 2009;136(4):763–776. doi: 10.1016/j.cell.2009.01.019
  • Hutchins CJ, Rathjen PD, Forster AC, et al. Self-cleavage of plus and minus RNA transcripts of avocado sunblotch viroid. Nucleic Acids Res. 1986;14(9):3627–3640. doi: 10.1093/nar/14.9.3627
  • Prody GA, Bakos JT, Buzayan JM, et al. Autolytic processing of dimeric plant virus satellite RNA. Science. 1986;231(4745):1577–1580. doi: 10.1126/science.231.4745.1577
  • Haseloff J, Gerlach WL. Simple RNA enzymes with new and highly specific endoribonuclease activities. Nature. 1988;334(6183):585–591. doi: 10.1038/334585a0
  • Diener TO. Circular RNAs: relics of precellular evolution? Proc Natl Acad Sci USA. 1989;86(23):9370–9374. doi: 10.1073/pnas.86.23.9370
  • Kristoffersen EL, Burman M, Noy A, et al. Rolling circle RNA synthesis catalyzed by RNA. Elife. 2022;11:e75186. doi: 10.7554/eLife.75186
  • Tupper AS, Higgs PG. Rolling-circle and strand-displacement mechanisms for non-enzymatic RNA replication at the time of the origin of life. J Theor Biol. 2021;527:110822. doi: 10.1016/j.jtbi.2021.110822
  • Rivera-Madrinan F, Di Iorio K, Higgs PG. Rolling circles as a means of encoding genes in the RNA world. Life-Basel. 2022;12(9):1373. doi: 10.3390/life12091373
  • Takeuchi N, Hogeweg P, Stormo GD. Multilevel selection in models of prebiotic evolution II: a direct comparison of compartmentalization and spatial self-organization. PLOS Comp Biol. 2009;5(10):e1000542. doi: 10.1371/journal.pcbi.1000542
  • Kim YE, Higgs PG, Wilke CO. Co-operation between polymerases and nucleotide synthetases in the RNA world. PLOS Comp Biol. 2016;12(11):e1005161. doi: 10.1371/journal.pcbi.1005161
  • Ma WT, Yu CW, Zhang WT. Circularity and self-cleavage as a strategy for the emergence of a chromosome in the RNA-based protocell. Biol Direct. 2013;8(1):21. doi: 10.1186/1745-6150-8-21
  • Maynard-Smith J, Szathmáry E. The origin of chromosomes I. Selection for linkage. J Theor Biol. 1993;164(4):437–446. doi: 10.1006/jtbi.1993.1165
  • Yin SL, Chen Y, Yu CW, et al. From molecular to cellular form: modeling the first major transition during the arising of life. BMC Evol Biol. 2019;19(1):84. doi: 10.1186/s12862-019-1412-5
  • Jauker M, Griesser H, Richert C. Copying of RNA sequences without pre-activation. Angew Chem Int Ed. 2015;54(48):14559–14563. doi: 10.1002/anie.201506592
  • Zhang SJ, Duzdevich D, Ding D, et al. Freeze-thaw cycles enable a prebiotically plausible and continuous pathway from nucleotide activation to nonenzymatic RNA copying. Proc Natl Acad Sci USA. 2022;119(17):e2116429119. doi: 10.1073/pnas.2116429119
  • Ding D, Zhang SJ, Szostak JW. Enhanced nonenzymatic RNA copying with in-situ activation of short oligonucleotides. Nucleic Acids Res. 2023;51(13):6528–6539. doi: 10.1093/nar/gkad439
  • Russell MJ, Hall AJ, Cairns-Smith AG, et al. Submarine hot spring and origin of life. Nature. 1988;336(6195):117. doi: 10.1038/336117a0
  • Martin W, Baross J, Kelley D, et al. Hydrothermal vents and the origin of life. Nat Rev Microbiol. 2008;6(11):805–814. doi: 10.1038/nrmicro1991
  • Colin-Garcia M, Heredia A, Cordero G, et al. Hydrothermal vents and prebiotic chemistry: a review. Boletin De La Soc Geologica Mex. 2016;68(3):599–620. doi: 10.18268/BSGM2016v68n3a13
  • Damer B, Deamer D. Coupled phases and combinatorial selection in fluctuating hydrothermal pools: a scenario to guide experimental approaches to the origin of cellular life. Life-Basel. 2015;5(1):872–887. doi: 10.3390/life5010872
  • Damer B, Deamer D. The hot spring hypothesis for an origin of life. Astrobiology. 2020;20(4):429–452. doi: 10.1089/ast.2019.2045
  • Liang YZ, Yu CW, Ma WT, et al. The automatic parameter-exploration with a machine-learning-like approach: powering the evolutionary modeling on the origin of life. PLOS Comp Biol. 2021;17(12):e1009761. doi: 10.1371/journal.pcbi.1009761
  • Zimm BH. Dynamics of polymer molecules in dilute solution: viscoelasticity, flow birefringence and dielectric loss. J Chem Phys. 2012;24(2):269–278. doi: 10.1063/1.1742462