477
Views
2
CrossRef citations to date
0
Altmetric
Commentary

Dynamics of gene duplication and transposons in microbial genomes following a sudden environmental change

&
Pages 221-224 | Received 29 Aug 2011, Accepted 08 Sep 2011, Published online: 01 Oct 2011

Abstract

The impact of gene duplication on evolution is as ubiquitous as point mutation, but this realization is not yet reflected in our quantitative models of population genetics. In this Extra Views article, we explore the implications of such models of gene duplication, specifically expanding on our previous work. We lay down a framework for understanding the impact of gene duplications on the evolution a biological network and give an analytical argument based on the concept of mutational error threshold for the necessity of gene duplications for the evolution of complex networks. In other words, by realizing that the impact of mutations must act appropriately in order to allow for the maintenance of complex networks, we develop a mathematical scaling argument that shows why gene duplication provides the types of mutations more favorable to increasing complexity. In the process of doing so, we seek to explain the relationship between per base pair mutation rates and genome size.

The evolutionary impact of gene duplication is evidenced by the abundant number and widespread nature of duplication mechanisms.Citation1 Gene duplication can be achieved internally as part of DNA repair, replication, and the activity of transposons, or externally through horizontal gene transfer. In general, duplication events occur at a rate that is an order of magnitude lower than that for point mutations, but with potentially much greater average impact on the phenotype of the organism per occurrence.Citation2 Gene family expansion, which occurs through gene duplications that have evolved new functions, has been shown to be prevalent in the branching of organismal lineagesCitation3 and copy-number variation has been correlated with the rate of sequence evolution.Citation4 The effect of gene duplication on the evolution of phenotypes is at least as ubiquitous as that of point mutation and yet has not been absorbed into our models of population genetics to the same degree.

Models of the role of gene duplication in gene family expansion have existed and continued to be refined since Ohno's 1970 model of duplication with neutral mutation.Citation3 More recently, an examination of the rates involved in gene duplication and deletion have indicated that the inferred rates do not allow for sufficient neutral mutation to be explanative of the rates at which novel functionality arise via duplication.Citation2 Furthermore, measurement of the fitness contributions of genes from a whole-genome duplication in yeast suggests that most of the duplicated genes were important today and at the time of duplication.Citation5 Instead, a model of continuous selection with enhanced expression via gene duplication has been proposed as an alternative model for adaptation of novel functionality.Citation2

In recent work, we quantitatively modeled the dynamics of gene duplication and found that the generic features of continuous selection with gene duplication produce the expected speed up in evolution of novel genes.Citation6 Furthermore, by positing that transposons were the vehicles of gene duplication, we were able to link our results to changes in transposon density in the evolution of obligate species.Citation7,Citation8 As further validation that transposon density is a good measure of gene duplication, we explicitly modeled the dynamics of transposons and horizontal gene transfers and simulated the interplay between the two that allow for the upkeep of genome plasticity in changing environments.Citation9

The power of such generic quantitative models is that they capture the key features of a stochastic process without overfitting to unmeasured parameters. In fact, the value of these models as widespread mechanisms throughout biology increases as they become less sensitive to the particular parameter choices—one of the features that appears to hold true experimentally for the gene duplication ratesCitation10 and that we explicitly demonstrate for our model.Citation6 We can reasonably use such a model to infer basic quantities despite the lack of measurements for all biological rates and constants. In order to estimate the time it takes to produce a novel gene, we simulated adaptation to novel sequences as defined in our previous work in reference Citation6. Assuming a generation times of 1 h per replication and a population size of 10Citation10 and the rate of meaningful point mutations given by Bergthorsson et al. rough calculation shows that it takes approximately 2.4 million years for the evolution of novel functionality—a result that is consistent with the 0.3–4 million year time estimates obtained from genomic comparisons of free-living and obligate species of Bordetella.Citation11

Despite the plausibility of continuous selection with gene duplication as a major pathway of novel functionality and the evolutionary speed up it explains, the rate estimates still underestimate the rate of evolution seen among antibiotic resistance genes and the progression of cancer.Citation12Citation15 Antibiotic resistance in human pathogens appears to have evolved de novo within the past 80 y.Citation12,Citation16 Cancer progresses within an individual's lifetime from an initially small population of variant phenotypes.Citation13,Citation15 Furthermore, the estimated rates are based on simulating the evolution toward one sequence target, whereas in a typical genome there are on the order of a thousand genes. While these considerations are taken into account by modifying the gene duplication and deletion rates to the appropriate rates for a particular gene, they cannot take into account the effect of gene duplications and deletions to other parts of the genome on the fitness of the organism.

The answer to this dilemma may lie in considering elements that are capable of enhancing the rate of evolution even further. Indeed in both the case of antibiotic resistance and cancer, evidence suggests that transposons and other mobile genetic elementsCitation12,Citation17 or higher site-specific duplication rates due to genomic instabilityCitation1,Citation18,Citation19 are involved. In this views article, we provide a theoretical framework for understanding the role of gene duplication and rearrangements and hypothesize that site-specific gene duplication rates can be tuned through selection to provide a further enhancement in to the rate of evolution of novel functionality necessary to explain the pace of evolution we observe in medicine.

Gene Duplication as a Driver of Network Evolution

Genes are essentially sequences that encode for biomolecules such as a protein or RNA. The collection of genes in an organism somehow makes up its phenotype. Understanding the evolution of phenotypes requires the ability to link genes with their interplay and impact on a biological network of interacting biomolecules. In this context, proteins are nodes in a directed graph that regulate each other by inhibition or activation. Duplications are capable of changing the network topology. Gene amplification can be thought of as increases in dosage or activity rate of a particular node. Gene rearrangements to new loci with different promoter regions result in changes in connectivity to various regulatory elements within the network. Likewise, domain swapping between protein encoding genes, which also occur through gene duplication mechanisms provide yet another means of directly changing the nature of the interactions.

Ultimately, a single gene duplication is equivalent in effect to many point mutations. In our previous work in reference Citation6, we showed how gene duplication sped up adaptation toward particular sequences. However, a more realistic representation of the biological problem would be to embed those sequences in a reaction network and select for a novel network function. By doing so, we would be more accurately representing a mutational space and taking into account the effect of duplications and deletions to various genes throughout a genome. Adaptation via gene duplication and rearrangements provide more relevant mutations to a network than point mutation alone and should be expected to lead to enhanced rate of adaptation in comparison to previous sequence-based models.

In order to understand why gene duplication should be more favorable to the evolution of a biological network, we consider the error thresholdCitation20 effects from point mutation and gene duplication separately. The error threshold is the point at which mutation exceeds the ability of selection pressure to maintain a beneficial mutation in the population.Citation20 It also defines the degree of optimality a population will be able to reach in light of mutational processes. We use the concept of error threshold to highlight the advantages of gene duplication and why mutation by gene duplication mechanisms scales properly for network evolution. These same arguments could be applied to horizontal gene transfers.

Consider the context of a population adapted to a particular optimum within a fitness landscape. In general, in organisms that possess a beneficial phenotype a fraction of their offspring PL will lose it due to mutation. This fraction is one minus the probability that the sequence has a harmful mutation. The per-base mutation rate is m and the number of possible harmful mutations is gh: pL=1(1m)gh

The replication rate of offspring is then r+ for those with the beneficial mutation and r0 for those without. The population equations are: dP+dt=r+(1m)ghP+dP0dt=r0P0+(1(1m)gh)P+

The error threshold is when these two populations grow at the same rate, e.g., when the fraction of beneficial mutants in the population is no higher than that of harmful ones.

If we assume that we are in this limit (e.g., P+ = P0) then this is when: r+(1m)ghr0m=1exp(log(r+r0)gh)

If gh is large, we can expand this in a Taylor series: m1ghlog(r+r0)

This generally implies that the mutation rate is limited by the sequence length—that it is the total mutations per generation that is the meaningful quantity, rather than the mutations per base. Interestingly, this appears to be reflected in the spontaneous mutation rates in microbes and viruses where the number of spontaneous mutations scales inversely with genome size,Citation21,Citation22 although the exact opposite trend appears to be prevalent especially in the multicellular eukaryotes.Citation22

Let us now consider the example of a regulatory network. If we imagine that we have N elements to this network, there are N2 possible links. If we assume that the intrinsic mutation rate is per possible link (e.g., every generation there is a chance m of forming every possible link) then we have that gh ∝ N2 and we find that we must lower the mutation rate accordingly to 1/N2. Consider the case where increasing number of elements in network allows for greater fitness through more complex and well-regulated responses to different conditions. If we approximate the effect of each additional node by an average positive contribution in fitness, the rate of evolution of this network is constrained to go as dNdt1N2 so that in a purely point utation framework, the rate of evolution of networks slows down and an organism's regulatory network can never grow beyond some fixed size.

There are two ways to escape this trap. The first is to use a different mutational mechanism such as gene duplication or horizontal gene transfer where the impact of mutations increases as the network grows. For example, if the rate of discovery of novel beneficial mutations scales with N, then the network size can grow logarithmically. This seems to be the case in microbes and viruses where both the number of spontaneous mutationsCitation21 and the number of gene duplications or horizontal gene transfer events seem to scale on a per cell basis.Citation9 The prevalence of gene duplication in multicellular eukaryotesCitation1 also indicates the utility of these larger mutational steps for the evolution in multicellular organisms, though there appear to be other factors that affect the rate of spontaneous mutations.Citation22

The second solution is for the number of deleterious mutations gh to scale more slowly than the number of network connections N2. This could be accomplished either by targeted mutations, i.e., a scheme that lends itself to a smaller fraction of deleterious mutations. This could be accomplished, for example, through network modularity where the mutation of connections within modules occurred at a greater rate than mutation of connections between modules. If the network evolution is modular, then not all N2 possible links need be subject to mutation at a given time. If the network is broken down into a hierarchy of groups, each of which has a different rate of mutation for internal links than external links, then the network can consist of some number of modules of smaller size M. Adding or removing a member is then approximately a local process within the module, and so the number of potential in-group links scales as M2 for N/M modules. This means that the mutation rate need only scale as 1/NM, not 1/N2. For small M, this effectively means that the limitation introduced by the mutation rate scale as approximately 1/N allowing for the evolution of larger networks.

Multicellular eukaryotes, in addition to gene duplication, could employ a modularity scheme enabling a greater complexity than would be possible otherwise. If such an organization was inherent in the unicellular eukaryotes, this may have played a role in the multiple independent evolutions of multicellularity within the eukaryotic lineage to the exclusion of the bacteria and archaea. The greater the modularity, the lesser constraint there would be on the spontaneous mutation rate. This provides an alternative explanation to the increasing rate of spontaneous mutation in multicellular eukaryotes.

In such a scheme, the rate of change to the interactions within the sub-network making up each module would have to be higher than the rate of change to the interactions between modules. This implies some sort of targeted evolution. For point mutations, it has been shown that the rate of mutation can vary according to conditions such as cell stressCitation23 and that mutation rate is subject to tuning via the process of selection.Citation24 In other words, both the means and viability for metaevolution, i.e., the evolution of evolvability have been identified suggesting that organisms have evolved to tune their levels of adaptability as a function of time. In the context of stress-induced mutagenesis this means that just as an organism finds itself at a disadvantage (stress) part of its subpopulation begins undergoing rapid mutations increasing the probability of a small subpopulation finding a successful phenotype through a series of mutations.Citation23 Similarly, gene duplication rates within a genome have been shown to differ in a site-specific manner due to genomic instabilities.Citation1,Citation18,Citation25 Site-specific duplication rates provide the means for tuning the evolvability within particular modules of a network in a heritable manner. This allow for network metaevolution in a manner that would satisfy the constraints for evolutionary modularity we describe.

The Role of Gene Duplication in Human Evolution

The sequencing of the human genome reveals the immense impact gene duplication has had on our own genomes.Citation26 Since then, the genetic mechanisms underpinning a wide variety of human diseases have been correlated with copy-number variationCitation27,Citation28 including cancer.Citation29 Despite the deleterious nature of these copy-number variations, these duplications and deletions may represent vestiges of an adaptive phase of human evolution. Many of these diseases are related to neurological or mental disorders. The relatively high rate of copy-number variation among some of these genesCitation18,Citation25 indicates that these genes may have played a role in the evolution of the human brain during the burst in gene duplications in the primate lineage leading up to humans.Citation30 Interestingly, while neurological function seems to be subject to large set of variations in copy-number, the genomic regions containing the HOX and ZFHX4 regulatory proteins with key roles in developmental processes, appears to be strikingly devoid of any interspersed repeats.Citation26 While there is currently no apparent advantage to maintaining high rates of gene duplication and deletion for the genes related to genetic disorders, selection on metaevolutionary traits acts more slowlyCitation24 and genomic instabilities may still be in the process of stabilizing after having reached an optimal copy number.

In concluding, we reiterate the importance of elucidating the impact gene duplication on evolution through quantitative modeling. Such modeling efforts allow us to better understand which mechanisms are sufficient and necessary to explain the rates and adaptive roles of gene duplication throughout evolutionary history. In this Extra Views article, we point out that additional adaptive rate enhancing features are necessary to explain the rate at which novel function evolves. We then propose that evolution on a network-based framework naturally favors gene duplication and that this could potentially explain at least part of the difference between the rates estimated using our sequence-based simulations. In addition to that, we point out how other features of gene duplication such as enhanced rates of duplication and deletion at sites of genomic instability could then further enhance the rate of adaptation. Quantitatively modeling these effects would provide not only a generic framework for understanding the general features of adaptation via gene duplication but could also potentially serve as a predictive tool for medical risk. This area is especially ripe as both metabolic and regulatory models for human cells already exist and because of the rapidly growing amounts of data on the evolutionary genetics of cancer progression.Citation13Citation15

Acknowledgments

N.C. thanks the Institute for Genomic Biology for financial support.

References

  • Hastings PJ, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet 2009; 10:551 - 564; http://dx.doi.org/10.1038/nrg2593;:19597530
  • Bergthorsson U, Andersson DI, Roth JR. Ohno's dilemma: Evolution of new genes under continuous selection. Proc Natl Acad Sci USA 2007; 104:17004 - 17009; http://dx.doi.org/10.1073/pnas.0707158104;PMID:17942681
  • Ohno S. Evolution by gene duplication 1970; London George Alien & Unwin Ltd Berlin, Heidelberg and New York: Springer-Verlag
  • Chen FC, Chen CJ, Li WH, Chuang TJ. Gene family size conservation is a good indicator of evolutionary rates. Mol Biol Evol 2010; 27:1750; PMID: 20194423; http://dx.doi.org/10.1093/molbev/msq055
  • DeLuna A, Vetsigian K, Shoresh N, Hegreness M, Colón-González M, Chao S, et al. Exposing the fitness contribution of duplicated genes. Nat Genet 2008; 40:676 - 681; PMID: 18408719; http://dx.doi.org/10.1038/ng.123
  • Chia N, Goldenfeld N. Dynamics of gene duplication and transposons in microbial genomes following a sudden environmental change. Phys Rev E Stat Nonlin Soft Matter Phys 2011; 83:21906; PMID: 21405862; http://dx.doi.org/10.1103/PhysRevE.83.021906
  • Moran NA, Plague GR. Genomic changes following host restriction in bacteria. Curr Opin Genet Dev 2004; 14:627 - 633; PMID: 15531157; http://dx.doi.org/10.1016/j.gde.2004.09.003
  • Plague GR, Dunbar HE, Tran PL, Moran NA. Extensive proliferation of transposable elements in heritable bacterial symbionts. J Bacteriol 2007; 1:1082 - 1087; PMID: 17981967; http://dx.doi.org/10.1128/JB.01082-07
  • Chia N, Goldenfeld N. Statistical mechanics of horizontal gene transfer in evolutionary ecology. J Stat Phys 2010; 1 - 15
  • Reams AB, Kofoid E, Savageau M, Roth JR. Duplication frequency in a population of Salmonella enterica rapidly approaches steady state with or without recombination. Genetics 2010; 184:1077; PMID: 20083614; http://dx.doi.org/10.1534/genetics.109.111963
  • Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, Harris DE, et al. Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 2003; 35:32 - 40; PMID: 12910271; http://dx.doi.org/10.1038/ng1227
  • Aminov RI, Mackie RI. Evolution and ecology of antibiotic resistance genes. FEMS Microbiol Lett 2007; 271:147 - 161; PMID: 17490428; http://dx.doi.org/10.1111/j.1574-6968.2007.00757.x
  • Shibata D. Mutation and epigenetic molecular clocks in cancer. Carcinogenesis 2011; 32:123; PMID: 21076057; http://dx.doi.org/10.1093/carcin/bgq239
  • Lambert G, Estévez-Salmeron L, Oh S, Liao D, Emerson BM, Tlsty TD, et al. An analogy between the evolution of drug resistance in bacterial communities and malignant tissues. Nat Rev Cancer 2011; 11:375 - 382; PMID: 21508974; http://dx.doi.org/10.1038/nrc3039
  • Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, et al. Tumour evolution inferred by single-cell sequencing. Nature 2011; 472:90 - 94; PMID: 21399628; http://dx.doi.org/10.1038/nature09807
  • Sandegren L, Andersson DI. Bacterial gene amplification: implications for the evolution of antibiotic resistance. Nat Rev Microbiol 2009; 7:578 - 588; PMID: 19609259; http://dx.doi.org/10.1038/nrmicro2174
  • Novick RP, Christie GE, PenadÊs JR. The phage-related chromosomal islands of Gram-positive bacteria. Nat Rev Microbiol 2010; 8:541 - 551; PMID: 20634809; http://dx.doi.org/10.1038/nrmicro2393
  • Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, et al. Mapping copy number variation by population-scale genome sequencing. Nature 2011; 470:59 - 65; PMID: 21293372; http://dx.doi.org/10.1038/nature09708
  • Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 2009; 461:809 - 813; PMID: 19812674; http://dx.doi.org/10.1038/nature08489
  • Eigen M. Molecular self-organization and the early stages of evolution. Q Rev Biophys 1971; 4:149 - 212; PMID: 5134461; http://dx.doi.org/10.1017/S0033583500000627
  • Drake JW. A constant rate of spontaneous mutation in DNA-based microbes. Proc Natl Acad Sci USA 1991; 88:7160; PMID: 1831267; http://dx.doi.org/10.1073/pnas.88.16.7160
  • Lynch M. Evolution of the mutation rate. Trends Genet 2010; 26:345 - 352; PMID: 20594608; http://dx.doi.org/10.1016/j.tig.2010.05.003
  • Rosenberg SM. Evolving responsively: adaptive mutation. Nat Rev Genet 2001; 2:504 - 515; PMID: 11433357; http://dx.doi.org/10.1038/35080556
  • Earl DJ, Deem MW. Evolvability is a selectable trait. Proc Natl Acad Sci USA 2004; 101:11531; PMID: 15289608; http://dx.doi.org/10.1073/pnas.0404656101
  • Perry GH, Tchinda J, McGrath SD, Zhang J, Picker SR, Cáceres AM, et al. Hotspots for copy number variation in chimpanzees and humans. Proc Natl Acad Sci USA 2006; 103:8006 - 8011; PMID: 16702545; http://dx.doi.org/10.1073/pnas.0602318103
  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001; 409:860 - 921; PMID: 11237011; http://dx.doi.org/10.1038/35057062
  • McCarroll SA, Altshuler DM. Copy-number variation and association studies of human disease. Nat Genet 2007; 39:37 - 42; PMID: 17597780; http://dx.doi.org/10.1038/ng2080
  • Zhang F, Gu W, Hurles ME, Lupski JR. Copy number variation in human health, disease and evolution. Annu Rev Genomics Hum Genet 2009; 10:451 - 481; PMID: 19715442; http://dx.doi.org/10.1146/annurev.genom.9.081307.164217
  • Pollack JR, Sørlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci USA 2002; 99:12963; PMID: 12297621; http://dx.doi.org/10.1073/pnas.162471999
  • Dumas L, Kim YH, Karimpour-Fard A, Cox M, Hopkins J, Pollack JR, et al. Gene copy number variation spanning 60 million years of human and primate evolution. Genome Res 2007; 17:1266; PMID: 17666543; http://dx.doi.org/10.1101/gr.6557307