1,340
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Large-scale sequence analysis reveals novel human-adaptive markers in PB2 segment of seasonal influenza A viruses

, , , , , , ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, & show all
Pages 1-12 | Received 22 Dec 2017, Accepted 18 Feb 2018, Published online: 29 Mar 2018

Abstract

To elucidate the adaptive strategies of influenza A viruses (IAVs) to human, we proposed a computational approach to identify human-adaptive mutations in seasonal IAVs, which have not been analyzed comprehensively. We compared representative PB2 sequences of 1425 avian IAVs and 2176 human IAVs and identified a total of 42 human-adaptive markers, including 28 and 31 markers in PB2 proteins of seasonal viruses H1N1 and H3N2, respectively. Notably, this comprehensive list encompasses almost all the markers identified in prior computational studies and 21 novel markers including an experimentally verified mutation K526R, suggesting the predictive power of our method. The strength of our analysis derives from the enormous amount of recently available sequences as well as the recognition that human-adaptive mutations are not necessarily conserved across subtypes. We also utilized mutual information to profile the inter-residue coevolution in PB2 protein. A total of 35 and 46 coevolving site pairs are identified in H1N1 and H3N2, respectively. Interestingly, 13 out of the 28 (46.4%) identified markers in H1N1 and 16 out of the 31 (51.6%) in H3N2 are embraced in the coevolving pairs. Many of them are paired with well-characterized human-adaptive mutations, indicating potential epistatic effect of these coevolving residues in human adaptation. Additionally, we reconstructed the PB2 evolutionary history of seasonal IAVs and demonstrated the distinct adaptive pathway of PB2 segment after reassortment from H1 to H3 lineage. Our study may provide clues for further experimental validation of human-adaptive mutations and shed light on the human adaptation process of seasonal IAVs.

Introduction

Influenza A virus (IAV) genome consists of 8 negative-sense RNA segments encoding 11–12 proteinsCitation1. The transcription and replication of influenza viruses are catalyzed by the viral polymerase complex composed of three subunits: polymerase basic protein 2 (PB2), polymerase basic protein 1 (PB1), and polymerase acidic protein (PA). The PB2 subunit binds the 5′ 7-methylguanosine cap of host pre-mRNAs, which are subsequently cleaved off 10–15 nucleotides downstream by PACitation2,Citation3. PB1 protein, a viral RNA-dependent RNA polymerase, catalyzes the addition of nucleotides to the resulting capped short RNA primer and initiates viral transcription. The cooperation between the polymerase complex subunits is essential for viral replication and transcriptionCitation4.

Migratory waterfowl is the natural reservoir of avian IAVs, from which IAVs are transmitted into other hosts, such as humans, domestic poultry, swine, and other speciesCitation5. The host spectrum of influenza virus is mainly dictated by hemagglutinin (HA) glycoprotein and the viral polymerase complexCitation4. Specific mutations in the receptor-binding domain of HA alter the specificity and affinity for the receptor and affect host tropism, while the mutations of viral polymerase proteins influence viral replication efficiency in new hostsCitation1,Citation4. Particularly, a single PB2 mutation E627K can cause highly increased replication efficiency and enhanced pathogenicityCitation6Citation8, enabling the replication of avian-origin IAVs in human cells. In addition, the PB2 mutations D701NCitation9,Citation10, G590S/Q591RCitation11,Citation12, and K526RCitation13 significantly increase replication efficiency in mammalian hosts.

Studies of human adaptation have been extended to other viral proteins. For example, T85I, G186S, and L336M in PA protein were identified to increase the polymerase activity of 2009 pandemic H1N1 (pH1N1) virusCitation14. Amino acids L473V and L598P of PB1 protein from an avian-origin IAV contributed to higher polymerase activity, especially in mammalian cellsCitation15. Non-structural protein 1 mutations F103L and M106I led to increased viral growth of a human H5N1 isolate in vitro (mouse and canine cells) and enhanced virulence in miceCitation16. Furthermore, epistatic effects of combinatorial mutations have been observed in IAV studies. Epistasis describes non-additive interactions among genetic sites, namely, the consequence of a mutation at one site depends on the presence of mutations at other sites. Epistasis commonly exists and plays an important role in immune escape and drug resistance in various pathogensCitation17. In terms of the epistasis in IAVs, three PB2 mutations I147T, K399T, and A588T showed marginal effect when individually introduced into H5N1 but highly increased the polymerase activity when introduced in combinationCitation18.

To date, most of the human-adaptive mutations have been identified from epidemic or pandemic influenza virus isolates. However, those in seasonal human IAVs such as H1N1 and H3N2 subtypes remain poorly investigated. To fill the knowledge gap, we conducted a large-scale sequence analysis to identify the potential human-adaptive mutations in seasonal IAV PB2 protein and infer the adaptive evolutionary history of seasonal IAVs.

Results

Distribution of isolates and subtypes

On the basis that PB2 segments of seasonal H1N1 and H3N2 are both derivatives of 1918 pandemic H1N1 virusCitation1, we aim to compare the PB2 segments of seasonal IAVs to avian IAVs for the discovery of potential adaptive markers in seasonal human IAVs. We surveyed 3457 and 6690 PB2 sequences of avian and human IAVs, respectively, with collecting years spanning from 1918 to 2016, in order to obtain a large pool of viral sequences. The distributions of collecting date and subtype were calculated. Avian IAVs are more diversified with 86 subtypes compared to 9 subtypes of human IAVs. The three major subtypes of avian viruses are H5N1 (16.7%), H3N8 (7.7%), and H6N2 (7.3%), while H1N1 (62.1%) and H3N2 (35.5%) are the dominant subtypes in human IAVs. We evaluated and eliminated sampling biases in both avian and human viruses, such as the oversampled H5N1 subtype in avian IAVs and the 2009 pandemic H1N1 in human IAVs. As a result, a total of 1425 avian and 2176 (1086 H1N1 and 1090 H3N2) human IAV sequences were retained for downstream analyses.

Sites with host-specific amino acids in PB2

In order to identify human-adaptive markers in the PB2 protein, we compared the sequences of avian and seasonal human IAVs. Considering the genetic distinctions, seasonal H1N1 and H3N2 were compared to avian viruses separately. Using our comparative method, 28 and 31 human-adaptive markers in seasonal H1N1 (Supplementary Table S1) and H3N2 (Supplementary Table S2) were identified, respectively. Next, we asked whether any of the identified markers have been experimentally verified in previous studies, which may reflect the validity of our analysis. To this end, we compiled known experimentally validated human-adaptive mutations through an extensive literature review. As demonstrated in Table , to date, a total of 23 mutations in PB2 protein have been reported to increase replication efficiency and/or enhance pathogenicity significantly. We found that seasonal IAVs H1N1 and H3N2 harbour six and eight verified human-adaptive mutations, respectively. Among them, six mutations including D9N, A199S, T271A, A588I, E627K, and K702R are common in H1N1 and H3N2; while two mutations K526R and A684S are specific in H3N2, implicating that human-adaptive mutations are not necessarily conserved across subtypes as described in previous studiesCitation19,Citation20. Additionally, we identified 12 markers, A44S, M64T, T81M, T105V, I292T, R368K, L475M, D567N, T569A, V613T, A674T, and G682S, which have been proposed to be human-adaptive markers in similar computational studiesCitation19Citation22. Notably, the other 11 markers, including T106A, V109I, V114I, I354L, R355T, A395V, I399V, Q447L, S490N, T491A, and V547I, in H1N1 and 10 markers, including I67V, N82S, E120D, Q194R, V227I, I382V, P453H, N456S, I463V, and T676I, in H3N2 have never been documented. A661T was excluded from both H1N1 and H3N2, since it was recently reported to have negligible effect on polymerase activityCitation23. More specifically, among the 11 novel markers in H1N1, T106A, V109I, and V114I are located in Nter domain; I354L, R355T, A395V, I399V, and Q447L in cap-binding domain; S490N and T491A in cap-627 linker domain; and V547I in 627 domain. Comparatively, among the 10 novel markers in H3N2, I67V, N82S, E120D, and V227I are located in Nter domain; Q194R in Lid domain; I382V, P453H, N456S, and I463V in cap-binding domain; and T676I in 627 domain (Fig. ). Taken together, by comparing the amino acid distribution of PB2 protein between avian and human IAVs, we identified 28 and 31 human-adaptive markers in H1N1 and H3N2 subtypes, respectively. More than half of them have been either experimentally verified or repeatedly predicted in previous computational studies. Additionally, we also pinpointed novel markers of human adaptation that reside in well-defined functional domains of PB2 protein.

List of experimentally verified PB2 human-adaptive mutations

Fig. 1 Host-specific signatures of seasonal IAVs in PB2 protein.

Localization of identified human-adaptive markers in the PB2 domains of seasonal IAVs H1N1 (a) and H3N2 (b). The cartoon representation of PB2 subunit was rendered by Pymol (PDB code: 4WSB). Domains are colored according to the color code in c. Experimentally verified markers are indicated in red sphere, computationally predicted markers are indicated in green, and novel markers are indicated in blue. c The positions of PB2 domains are color-coded and labeled according to their functions

Fig. 1 Host-specific signatures of seasonal IAVs in PB2 protein.Localization of identified human-adaptive markers in the PB2 domains of seasonal IAVs H1N1 (a) and H3N2 (b). The cartoon representation of PB2 subunit was rendered by Pymol (PDB code: 4WSB). Domains are colored according to the color code in c. Experimentally verified markers are indicated in red sphere, computationally predicted markers are indicated in green, and novel markers are indicated in blue. c The positions of PB2 domains are color-coded and labeled according to their functions

Sites of coevolution in PB2

Patterns of amino acid conservation across a large set of homologs can be utilized to identify structurally or functionally important residues; meanwhile, patterns of correlated substitutions or amino acid covariation can also reveal important residuesCitation24. To explore the coevolution profile of PB2 protein in H1N1 and H3N2 viruses, we quantified the covariation between site pairs with mutual information (MI)Citation25. Since identification of coevolution with MI requires sufficiently large alignment of homologous sequencesCitation24, we combined avian, swine, and seasonal human IAV PB2 sequences. However, MI values can be misleading when homologous sequences are not collected properly or the sequence alignment is not built correctlyCitation26. In order to take different homologous sequences into account more equivalently and to make the MI values more comparable between H1N1 and H3N2, we performed resampling to construct balanced samples with equal number of avian, swine, and seasonal human IAVs. Using this approach, we identified 35 (Supplementary Table S3) and 46 (Supplementary Table S4) coevolving site pairs, embracing 13 (Fig. ) and 18 (Fig. ) identified markers in PB2 of H1N1 and H3N2, respectively. These coevolving sites are distributed in almost all domains except Mid domain (Figs.  and ). The Nter, cap-binding, and 627 domains are highly connected with each other in both H1N1 and H3N2 coevolution networks, consistent to the idea that the N-terminal third of PB2 (amino acids 1–247) is not only structurally but also functionally a part of the polymerase coreCitation27. Notably, the epistatic effects of a number of coevolving pairs have been demonstrated in previous wet-lab studies. For instance, sites 199 and 627 show high covariation with each other in both H1N1 and H3N2 subtypes. Consistently, the significant synergistic effect of A199S and E627K on viral replication and pathogenicity was observed in H5N1 subtypeCitation28, which can lend support to the validity of our approach.

Fig. 2 MI network of seasonal H1N1 IAV PB2 protein.

a Circular representation of PB2 protein and the MI network. The positions of PB2 domains are color-coded and labeled according to their functions. Arcs connect site pairs with normalized MI > 0.7. b Cytoscape representation of the MI network. Sites are represented as nodes. Edge between two nodes indicates the normalized MI > 0.7. Sites with experimentally verified human-adaptive mutations and computationally identified markers are highlighted in red and green, respectively, in a, b; blue nodes indicate novel markers

Fig. 2 MI network of seasonal H1N1 IAV PB2 protein.a Circular representation of PB2 protein and the MI network. The positions of PB2 domains are color-coded and labeled according to their functions. Arcs connect site pairs with normalized MI > 0.7. b Cytoscape representation of the MI network. Sites are represented as nodes. Edge between two nodes indicates the normalized MI > 0.7. Sites with experimentally verified human-adaptive mutations and computationally identified markers are highlighted in red and green, respectively, in a, b; blue nodes indicate novel markers
Fig. 3 MI network of seasonal H3N2 IAV PB2 protein.

a Circular representation of PB2 protein and the MI network. The positions of PB2 domains are color-coded and labeled according to their functions. Arcs connect site pairs with normalized MI > 0.7. b Cytoscape representation of the MI network. Sites are represented as nodes. Edge between two nodes indicates the normalized MI > 0.7. Sites with experimentally verified human-adaptive mutations and computationally identified markers are highlighted in red and green, respectively, in a, b; blue nodes indicate novel markers

Fig. 3 MI network of seasonal H3N2 IAV PB2 protein.a Circular representation of PB2 protein and the MI network. The positions of PB2 domains are color-coded and labeled according to their functions. Arcs connect site pairs with normalized MI > 0.7. b Cytoscape representation of the MI network. Sites are represented as nodes. Edge between two nodes indicates the normalized MI > 0.7. Sites with experimentally verified human-adaptive mutations and computationally identified markers are highlighted in red and green, respectively, in a, b; blue nodes indicate novel markers

We also noted that multiple verified human-adaptive mutations are present in the coevolution networks, in connection with other identified markers. As shown in H1N1 coevolution network (Fig. ), the well-characterized human-adaptive mutations A199S and E627K are connected with a computational marker L475M. Another verified human-adaptive mutation K702R is connected with two computational markers and two novel markers identified in our study. The more extensive connections in H3N2 coevolution network are demonstrated in Fig. . The four well-characterized mutations A199S, K526R, E627K, and K702R are connected with nine computational markers and a novel marker E120D. The coevolution between site pairs indicates their structural or functional importance in conformation stabilization of the polymerase or in adaptation into different hosts. Collectively, we disclosed extensive coevolution networks in PB2 protein of H1N1 and H3N2 subtypes and demonstrated that a large portion (>40%) of the identified human-adaptive markers exhibit significant coevolution.

Adaptive evolutionary history of PB2 protein

In an effort to understand how the identified markers emerged temporally, we reconstructed the most recent common ancestor (MRCA) (Supplementary Text S1) and the evolutionary history of PB2 segments of seasonal IAVs with Bayesian phylogenetic inference. The representative sequences of evolutionary history were selected with the linear regression of root-to-tip genetic distance against divergence time under the strict molecular clock. As shown in Fig. , 115 (Supplementary Text S2) and 170 (Supplementary Text S3) sequences are retained for the reconstruction of evolutionary history of H1N1 and H3N2 respectively, with regression r2 exceeding 0.9.

Fig. 4 Correlation of root-to-tip divergence with evolution time.

a Correlation of representative H1N1 PB2 sequences from 1918 to 2009 with root-to-tip divergence. b Correlation of representative H3N2 PB2 sequences from 1968 to 2016 with root-to-tip divergence. The x axis represents the evolution time of the MRCA; the y axis represents the sequence divergence from the MRCA. Each dot corresponds to a PB2 protein sequence

Fig. 4 Correlation of root-to-tip divergence with evolution time.a Correlation of representative H1N1 PB2 sequences from 1918 to 2009 with root-to-tip divergence. b Correlation of representative H3N2 PB2 sequences from 1968 to 2016 with root-to-tip divergence. The x axis represents the evolution time of the MRCA; the y axis represents the sequence divergence from the MRCA. Each dot corresponds to a PB2 protein sequence

The maximum clade credibility (MCC) tree and the corresponding mutational path were summarized from the Bayesian Markov chain Monte Carlo (MCMC) simulations. The MCC trees illustrating the evolution of the PB2 gene of seasonal H1N1 and H3N2 have a “cactus-like” shape with a strong temporal structure (Figs.  and ). The trunk represents the succession of surviving viral lineages over time while short side branches indicate the extinction of most strains. The evolutionary history of seasonal H1N1 from 1918 to 2009 (replaced by the 2009 pH1N1Citation29,Citation30) was reconstructed with 115 sequences, with roughly 1.26 sequences per year. In contrast, the evolutionary history of H3N2 from 1968 to 2016 is covered by 3.54 sequences per year. Accordingly, the mutational path of H3N2 (Fig. ) is more definite than that of H1N1 (Fig. ).

Fig. 5 MCC tree and mutational path of seasonal H1N1 IAV.

a MCC tree of seasonal H1N1 IAVs circulated between 1918 and 2009. A total of 115 representative PB2 sequences were selected to reconstruct the tree by Bayesian phylogenetic inference. The evolutionary path from the MRCA to the most divergent descendant (A/California/6/2007) is highlighted in the tree. b Reconstructed mutational path from the MRCA to the most divergent descendant. The x and y axes represent the time scale and the mutations in the path, respectively. The median and 90% BCI of estimated date of each mutation are shown in the boxplot. Experimentally verified markers, computationally predicted markers, and novel markers are indicated in red, green, and blue, respectively

Fig. 5 MCC tree and mutational path of seasonal H1N1 IAV.a MCC tree of seasonal H1N1 IAVs circulated between 1918 and 2009. A total of 115 representative PB2 sequences were selected to reconstruct the tree by Bayesian phylogenetic inference. The evolutionary path from the MRCA to the most divergent descendant (A/California/6/2007) is highlighted in the tree. b Reconstructed mutational path from the MRCA to the most divergent descendant. The x and y axes represent the time scale and the mutations in the path, respectively. The median and 90% BCI of estimated date of each mutation are shown in the boxplot. Experimentally verified markers, computationally predicted markers, and novel markers are indicated in red, green, and blue, respectively
Fig. 6 MCC tree and mutational path of seasonal H3N2 IAV.

a MCC tree of seasonal H3N2 IAVs circulating between 1968 and 2016. A total of 170 representative PB2 sequences were selected to reconstruct the tree by Bayesian phylogenetic inference. The evolutionary path from the MRCA to the most divergent descendant (A/Hawaii/13/2016) is highlighted in the tree. b Reconstructed mutational path from the MRCA to the most divergent descendant. The x and y axes represent the time scale and the mutations in the path, respectively. The median and 90% BCI of estimated date of each mutation are shown in the boxplot. Experimentally verified markers, computationally predicted markers, and novel markers are indicated in red, green, and blue, respectively

Fig. 6 MCC tree and mutational path of seasonal H3N2 IAV.a MCC tree of seasonal H3N2 IAVs circulating between 1968 and 2016. A total of 170 representative PB2 sequences were selected to reconstruct the tree by Bayesian phylogenetic inference. The evolutionary path from the MRCA to the most divergent descendant (A/Hawaii/13/2016) is highlighted in the tree. b Reconstructed mutational path from the MRCA to the most divergent descendant. The x and y axes represent the time scale and the mutations in the path, respectively. The median and 90% BCI of estimated date of each mutation are shown in the boxplot. Experimentally verified markers, computationally predicted markers, and novel markers are indicated in red, green, and blue, respectively

We also estimated the emerging dates of the identified markers (Table ). Among the 28 identified human-adaptive markers in H1N1, the MRCA of H1N1 PB2 harbored 6 markers originally and acquired the other 22 markers until 1993 (90% Bayesian credible interval, BCI, 1990–1995). Interestingly, three out of the six markers, A199S, E627K, and K702R, are top-ranked human-adaptive mutations, suggesting their importance in crossing the species barrier at the early stage. The remaining three verified human-adaptive mutations D9N, T271A, and A588I were acquired before 1922 (BCI 1917–1926). In contrast, among the 31 identified markers in H3N2 IAVs, the MRCA of H3N2 PB2 harbored 23 markers originally, of which 7 markers have been experimentally verified (Table ). The remaining 8 markers were acquired sequentially within around 25 years, the last verified human-adaptive mutation K526R was acquired in 1969 (BCI 1967–1971). Therefore, the PB2 segment of H3N2 developed a distinct adaptive pathway compared to H1N1. The PB2 protein of seasonal H1N1 viruses may originate from a limitedly adapted avian-origin IAV. It acquired most of the human-adaptive mutations during circulating in human population. On the contrary, the PB2 segment of seasonal H3N2 was derived from a well-adapted predecessor originating from seasonal H1N1 viruses through segment reassortmentCitation1. It acquired human adaptation mutations, such as K526R and A684S, which are absent in seasonal H1N1 viruses.

Estimated emerging dates of identified markers in seasonal H1N1 and H3N2 PB2

Discussion

Adaptive mutations in the PB2 polymerase play a critical role for avian influenza virus replication in mammalian cells and enable some avian IAVs to establish infection in humans. In this study, we conducted a large-scale sequence comparison between avian and human IAVs, whereby a comprehensive list of novel human-adaptive markers in PB2 protein of H1N1 and H3N2 viruses were identified. To our knowledge, this is basically an exhaustive list, which encompasses most well-characterized human-adaptive markers in PB2 protein identified in prior studies as well as the novel markers obtained from this study. The identification of a large pool of adaptive markers allows us to uncover the coevolution pattern among these markers, which implicates their correlated function in host adaptation. Additionally, we demonstrated the distinct evolutionary pathways of seasonal H1N1 and H3N2 viruses.

Previous similar computational studies tend to focus on human-adaptive mutations that are conserved in all subtypesCitation19. Intriguingly, we noted that two well-characterized human-adaptive mutations K526R and A684S are specific in the H3N2 subtype, suggesting the necessary concerns for subtype-specific mutations. Therefore, we relaxed the assumption and compared seasonal H1N1 and H3N2 to avian viruses separately. With our approach, a number of novel markers that have not been reported in similar computational studies were identified in both H1N1 and H3N2 subtypes. Notably, an experimentally verified mutation K526R, which has not been identified in previous computational studiesCitation19,Citation20,Citation31, was revealed in our study. The outperformance of our approach may originate from two improvements. First, we relaxed the assumption in previous approaches that human-adaptive mutations should be conserved across subtypes as aforementioned. Second, the dramatically expanding databases provide large-scale thoroughly sampled sequence data. With the availability of comprehensive sequence data of seasonal IAVs, we can significantly improve the predictive power.

There has been a growing recognition that epistasis plays a key role in functional evolution of proteins by constraining accessible evolutionary pathways and increasing the role of contingency in adaptationCitation17,Citation32,Citation33. The functional effect of a given substitution frequently depends on the presence or absence of other substitution(s)Citation17. Thus epistatic sites usually present covariation or coevolution and exhibit particular patterns in multiple sequence alignment (MSA) of homologous proteins. The algorithms for detecting coevolution can be divided into two general classes: algorithms such as MI and statistical coupling analysis that score covariation between all pairs of columns in a sequence alignment; while global probabilistic models such as direct coupling analysis that assess the likelihood of covariation between sitesCitation24. The former algorithm has been utilized in our study to quantify the inter-residue covariation, since it is conceptually straightforward, technically simpler to implement, and often sufficiently powerful to provide useful insights of coevolution.

Analyses of amino acid coevolution within protein family can serve as a valuable guide for identifying residues that are functionally coupledCitation24. Indeed, a number of the identified coevolving pairs in this study showed significant epistasis in previous wet-lab studies. For instance, combination of coevolving pair A199S and E627K imposed a strong synergistic effect on replication efficiencyCitation28. R368K, a coevolving partner of E627K, showed limited effect when introduced alone into a H5N1 virus strain but significantly increased replication efficiency and pathogenicity when combined with 627KCitation28. Moreover, the adaptive effects of a number of mutations in PA and NP proteins also showed dependency on E627K. For instance, three NP mutations R100V/I and L283PCitation28 can cause a failure of virus rescue but showed highly enhanced replication efficiency with 627KCitation28. Integrating the prior findings and discoveries obtained from our study, we believe that the strong human-adaptive effect may not solely result from the verified mutations as demonstrated previouslyCitation6. Instead, the effects of verified mutations may depend on the intricate interplays with other mutations. Currently, we only focused on the inter-residue coevolution within PB2 protein due to the lack of adequate whole-genome sequences. Upon the availability of sufficient whole-genome sequences in the future, inter-protein coevolution analysis could uncover important residues that are involved in protein–protein interactions.

This study, like similar studies, is sensitive to sampling biases that have to be estimated and controlled properly. Otherwise, the true distinctions would probably be masked. For instance, H1N1 sequences were highly oversampled in 2009 during the pandemic so that the number of pH1N1 sequences is far more than the total amount of seasonal H1N1 in public databases. Additionally, pH1N1 actually derived from multiple reassortment events and contains a swine-origin PB2 segmentCitation34,Citation35, which may override the true distinctions between avian and seasonal human IAVs. Therefore, we carefully eliminated pH1N1 prior to comparison. Despite all the efforts, our comparative method does have limitations. We are unable to identify newly fixed adaptive mutations, since they are unlikely to achieve predominance to fulfill the first criterion. For instance, two verified mutations M535L and E249G, which were acquired by seasonal H1N1 (Fig. ) and H3N2 (Fig. ) subtypes recently, are not identified with our method.

In summary, we designed a simple approach to study the human adaptation of seasonal IAVs. We identified a large number of human-adaptive markers and profiled the coevolution among them. In addition, we inferred the MRCA and adaptive evolutionary history of seasonal IAVs and estimated the emerging dates and sequential order of each identified markers. We believe that our findings will provide clues for further experimental validation of singular and combinatorial human-adaptive mutations and shed light on the human adaptation process of seasonal IAVs.

Materials and methods

Sequence preprocessing

The PB2 sequences of avian and human IAVs were retrieved from the OpenFluDB database (http://openflu.vital-it.ch). Subtype distributions of both avian and human IAVs were estimated. For avian IAV viruses, replicate sequences within the same subtype due to oversampling were removed. For human IAV viruses, non-seasonal subtypes such as 2009 pandemic H1N1 (3083 sequences), H5N1 (367 sequences), and H7N9 were excluded. The remaining IAVs were subdivided into two subsets, H1N1 and H3N2. Replicate sequences within each subset collected in the same year were excluded to eliminate oversampling in certain years. Nonsense characters of each sequence were trimmed while those with partial length (<759 aa) were removed. MSA of PB2 was constructed using the Muscle v3.7Citation36 software with the fastest parameters (-maxiters 2 without refinement), due to the large amount of sequences.

Sites with host-specific amino acids

Site-wise amino acid compositions in PB2 of H1N1 and H3N2 subsets were compared to those of avian IAVs, respectively. Frequencies (F) of amino acids in each aligned position were calculated. The predominant amino acid of each site is defined as that with the largest F. Pearson’s chi-square test was used to test the statistical significance of amino acid distribution difference in the corresponding position between avian and human IAVs. Cramer’s V test was utilized to normalize the chi-square statistic to control for dataset size and quantify the effect size. Accordingly, two criteria were employed to define an amino acid as a human-adaptive marker in a given site: (i) the frequency of the amino acid is predominant (F > 0.5) in human IAVs and minor (F < 0.5) in avian IAVs; (ii) the Cramer’s V value is >0.8. The marker is indicated as “A + site + B” where A and B are the predominant amino acids of avian and human IAVs, respectively, in that site and the substitution from A to B is assumed to be responsible for human adaptation. Only the counts of amino acid A and B in each aligned position of avian and human IAVs were taken into tests to ensure the same degree. We used 1% of the total count as a pseudo value in tests when the observed count is <5. The sequence comparison was implemented using Python and Biopython frameworkCitation37. Statistical tests were performed using R languageCitation38. Sites with host-specific amino acids were mapped onto the polymerase complex (PDB code: 4WSB) by using Pymol. The source code is available upon request.

Coevolution analysis

Shannon’s entropy (H) is a measure of uncertainty or randomnessCitation39. The entropy of a column c in a MSA is calculated with the following equation:,Hc=-i=120p(xi)log20p(xi)Here p(xi) is the observed frequency of amino acid i occurring at a site. All values were calculated using a log20 so that the range of position entropy scores is 0–1. The covariation between two sites is quantified using MI, which quantifies the mutual dependence between two variables. The MI between two positions in a MSA is given asMIc,d=Hc+Hd-Hcdwhere Hc and Hd are entropies of column c and d, respectively, and Hcd is the joint entropy of column c and d calculated by the same method using the frequencies of occurrence of each combination of residues in column c and d. The MI scores range from 0 to the minimum of Hc or Hd. The raw MI values are normalized by dividing by the joint entropy of the positions, Hcd, to reduce the influence of entropy on MI. Normalized MI values range from 0 to 1Citation25.

Homologous sequence alignment for covariation quantification was constructed with avian, swine, and human IAV PB2 sequences of the same subtype. All sequences were retrieved from OpenFluDB. Replicates in each virus subset were excluded for computational simplicity. Owing to the sensitivity of MI to sampling balance, re-sampling was performed for 1000 times, in which we randomly extracted equal number of sequences from each host category with replacement to construct balanced samples. The number of extraction was determined by the minimum sequence count of the subsets. Average MI values between all site pairs of the 1000 reconstructed samples were then calculated. Site pairs with average MI value >0.7 and average entropy values of both sites >0.2 were designated as coevolving sites. Entropy and MI value calculation was implemented in custom python scripts.

MCC tree and evolutionary history reconstruction

Preprocessed human seasonal IAV sequences were used for the construction of MCC tree and evolutionary history. A maximum of 10 sequences of H1N1 and H3N2 per year were retained to avoid the trees being too large. Next, RAxMLCitation40 (version 8.2.9; http://sco.hits.org/exelixis/web/soft-ware/raxml/) was utilized to infer the parsimony tree with JTT model. The tree was then visually analyzed using TempEstCitation41 (version 1.5; http://tree.bio.ed.ac.uk/software/tempest/) to identify potential violations that substantially deviated from the linear regression of root-to-tip genetic distance against divergence time. We removed the outliers and then repeated these two steps to achieve a high consistence between molecular clock and stamped dates. The remaining sequences were used in downstream analyses.

To infer the evolutionary history and the MRCA for the PB2 sequences, a Bayesian MCMC method was applied as implemented in the BEAST package (version 1.8.3; http://beast.bio.ed.ac.uk). BEAGLE (http://beast.bio.ed.ac.uk/BEAGLE) was utilized to boost the core computation. The BEAST XML input file was generated using the combination of BEAUTI (inside BEAST package) and hand-annotation. This XML file that specifies date-stamped sequences, a strict molecular clock and a JTT model of substitution, was used in multiple runs of MCMC simulation. The MCMC chain was set as 100 million iterations, with subsampling every 10,000 iterations. Tracer (version 1.6; http://beast.bio.ed.ac.uk/tracer) was used to track the log file of combined runs with the initial 10% of the chain as burn-in to ensure good MCMC convergence. The MCC tree was summarized using TreeAnnotator (version 1.8.3; inside BEAST package) on the basis of merged simulations. FigTree (version 1.4.2; http://tree.bio.ed.ac.uk/software/figtree) was used to visualize the tree file, manually recolor the branches, and export to tree images. The mutational path was extracted from the merged trees using Mutpath python packageCitation33 (available at http://github.com/jbloom/mutpath), with manually created input file.

Availability of data and material

The datasets used and/or analyzed during the current study are available from the corresponding author on request.

Acknowledgements

This study was partially supported by the Providence Foundation Limited in memory of the late Dr Lui Hac Minh; as well as Health and Medical Research Fund (HMRF) Commissioned Studies on Emerging Influenza A Viruses with Epidemic Potential (Ref No. RRG-05) from Food and Health Bureau, Hong Kong SAR. We would also like to thank the research groups that contributed sequence data to the OpenFluDB database at Swiss Institute of Bioinformatics (SIB).

Authors' contributions

L.W., J.Z., and K.-Y.Y. conceived and designed the study. L.W. collected the sequence data, implemented the scripts for sequence processing, and data analysis. L.W., B.H.-Y.W., D.W., C.L., X.Z., M.-C.C., S.Y., Y.F., and J.Z. interpreted the results. L.W., H.C., H.C., J.Z., and K.-Y.Y. wrote and revised the manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Supplementary Information accompanies this paper at (10.1038/s41426-018-0050-0).

References

  • TaubenbergerJKKashJCInfluenza virus evolution, host adaptation and pandemic formationCell Host Microbe20107 440 45110.1016/j.chom.2010.05.0092892379
  • FodorEA single amino acid mutation in the PA subunit of the influenza virus RNA polymerase inhibits endonucleolytic cleavage of capped RNAsJ. Virol.2002768989900110.1128/JVI.76.18.8989-9001.2002136441
  • DiasAThe cap-snatching endonuclease of influenza virus polymerase resides in the PA subunitNature200945891491810.1038/nature07745
  • MedinaRAGarcia-SastreAInfluenza A viruses: new research developmentsNat. Rev. Microbiol.2011959060310.1038/nrmicro2613
  • WebsterRGBeanWJGormanOTChambersTMKawaokaYEvolution and ecology of influenza A virusesMicrobiol. Rev.199256152179372859
  • SubbaraoEKLondonWMurphyBRA single amino acid in the PB2 gene of influenza A virus is a determinant of host rangeJ. Virol.19936717611764240216
  • HattaMGrowth of H5N1 influenza A viruses in the upper respiratory tracts of micePLoS Pathog.200731374137910.1371/journal.ppat.0030133
  • SteelJLowenACMubarekaSPalesePTransmission of influenza virus in a mammalian host is increased by PB2 amino acids 627K or 627E/701NPLoS Pathog.20095e100025210.1371/journal.ppat.10002522603332
  • Czudai-MatwichVOtteAMatrosovichMGabrielGKlenkHDPB2 mutations D701N and S714R promote adaptation of an influenza H5N1 virus to a mammalian hostJ. Virol.2014888735874210.1128/JVI.00422-144136279
  • ZhuWDual E627K and D701N mutations in the PB2 protein of A(H7N9) influenza virus increased its virulence in mammalian modelsSci. Rep.2015510.1038/srep141704585756
  • MehleADoudnaJAAdaptive strategies of the influenza virus polymerase for replication in humansProc. Natl. Acad. Sci. USA2009106213122131610.1073/pnas.09119151062789757
  • LiuQCombination of PB2 271A and SR polymorphism at positions 590/591 is critical for viral replication and virulence of swine influenza virus in cultured cells and in vivoJ. Virol.2012861233123710.1128/JVI.05699-113255826
  • SongWThe K526R substitution in viral protein PB2 enhances the effects of E627K on influenza virus replicationNat. Commun.2014510.1038/ncomms65094263149
  • BusseyKAPA residues in the 2009 H1N1 pandemic influenza virus enhance avian influenza virus polymerase activity in mammalian cellsJ. Virol.2011857020702810.1128/JVI.00522-113126589
  • XuCAmino acids 473V and 598P of PB1 from an avian-origin influenza A virus contribute to polymerase activity, especially in mammalian cellsJ. Gen. Virol.20129353154010.1099/vir.0.036434-0
  • DankarSKInfluenza A virus NS1 gene mutations F103L and M106I increase replication and virulenceVirol. J.201181310.1186/1743-422X-8-133032709
  • KryazhimskiySDushoffJBazykinGAPlotkinJBPrevalence of epistasis in the evolution of influenza A surface proteinsPLoS Genet.20117e100130110.1371/journal.pgen.10013013040651
  • FanSNovel residues in avian influenza virus PB2 protein affect virulence in mammalian hostsNat. Commun.2014510.1038/ncomms60215841464
  • FinkelsteinDBPersistent host markers in pandemic and H5N1 influenza virusesJ. Virol.200781102921029910.1128/JVI.00921-072045501
  • ChenGWGenomic signatures of human versus avian influenza A virusesEmerg. Infect. Dis.2006121353136010.3201/eid1209.0602763294750
  • MiottoOComplete-proteome mapping of human influenza A adaptive mutations: implications for human transmissibility of zoonotic strainsPLoS ONE.20105e902510.1371/journal.pone.00090252815782
  • MiottoOHeinyATTanTWAugustJTBrusicVIdentification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysisBMC Bioinformatics2008910.1186/1471-2105-9-S1-S182259419
  • ElgendyEMIdentification of polymerase gene mutations that affect viral replication in H5N1 influenza viruses isolated from pigeonsJ. Gen. Virol.20179861710.1099/jgv.0.000674
  • AshenbergOLaubMTUsing analyses of amino acid coevolution to understand protein structure and functionMethods Enzymol.201352319121210.1016/B978-0-12-394292-0.00009-6
  • MartinLCGloorGBDunnSDWahlLMUsing information theory to search for co-evolving residues in proteinsBioinformatics2005214116412410.1093/bioinformatics/bti671
  • SimonettiFLTeppaEChernomoretzANielsenMMarino BusljeCMISTIC: mutual information server to infer coevolutionNucleic Acids Res.201341W8W1410.1093/nar/gkt4273692073
  • NilssonBEte VelthuisAJWFodorERole of the PB2 627 domain in influenza A virus polymerase functionJ. Virol.201791e02467024165355620
  • KimJHRole of host-specific amino acids in the pathogenicity of avian H5N1 influenza viruses in miceJ. Gen. Virol.2010911284128910.1099/vir.0.018143-02878586
  • PicaNHemagglutinin stalk antibodies elicited by the 2009 pandemic influenza virus as a mechanism for the extinction of seasonal H1N1 virusesProc. Natl. Acad. Sci. USA20121092573257810.1073/pnas.12000391093289326
  • ZhouJA functional variation in CD55 increases the severity of 2009 pandemic H1N1 influenza A virus infectionJ. Infect. Dis.201220649550310.1093/infdis/jis378
  • TamuriAUdos ReisMHayAJGoldsteinRAIdentifying changes in selective constraints: host shifts in influenzaPLoS Comput. Biol.20095e100056410.1371/journal.pcbi.10005642770840
  • WeinreichDMWatsonRAChaoLPerspective: sign epistasis and genetic constraint on evolutionary trajectoriesEvolution20055911651174
  • GongLISuchardMABloomJDStability-mediated epistasis constrains the evolution of an influenza proteineLife20132e0063110.7554/eLife.006313654441
  • MenaIOrigins of the 2009 H1N1 influenza pandemic in swine in MexicoeLife20165e167774957980
  • HungIFEffect of clinical and virological parameters on the level of neutralizing antibody against pandemic influenza A virus H1N1 2009Clin. Infect. Dis.20105127427910.1086/653940
  • EdgarRCMUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Res.2004321792179710.1093/nar/gkh340390337
  • CockPJBiopython: freely available Python tools for computational molecular biology and bioinformaticsBioinformatics2009251422142310.1093/bioinformatics/btp1632682512
  • Ash, R. B. Information Theory. (Dover Publications, 1965).
  • StamatakisARAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogeniesBioinformatics2014301312131310.1093/bioinformatics/btu0333998144
  • RambautALamTTMax CarvalhoLPybusOGExploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen)Virus Evol.20162vew00710.1093/ve/vew0074989882
  • GraefKMThe PB2 subunit of the influenza virus RNA polymerase affects virulence by interacting with the mitochondrial antiviral signaling protein and inhibiting expression of beta interferonJ. Virol.2010848433844510.1128/JVI.00879-102919034
  • WangJMouse-adapted H9N2 influenza A virus PB2 protein M147L and E627K mutations are critical for high virulencePLoS ONE20127e4075210.1371/journal.pone.00407523393695
  • ZhouBPB2 residue 158 is a pathogenic determinant of pandemic H1N1 and H5 influenza a viruses in miceJ. Virol.20118535736510.1128/JVI.01694-10
  • NgaiKLKChanMCWChanPKSReplication and transcription activities of ribonucleoprotein complexes reconstituted from avian H5N1, H1N1pdm09 and H3N2 influenza A virusesPLoS ONE20138e6503810.1371/journal.pone.00650383672204
  • YamajiRIdentification of PB2 mutations responsible for the efficient replication of H5N1 influenza viruses in human lung epithelial cellsJ. Virol.2015893947395610.1128/JVI.03328-144403392
  • MokCKAmino acid residues 253 and 591 of the PB2 protein of avian influenza virus A H9N2 contribute to mammalian pathogenesisJ. Virol.2011859641964510.1128/JVI.00702-113165745
  • ManzoorRPB2 protein of a highly pathogenic avian influenza virus strain A/chicken/Yamaguchi/7/2004 (H5N1) determines its replication potential in pigsJ. Virol.2009831572157810.1128/JVI.01879-08
  • BusseyKABousseTLDesmetEAKimBTakimotoTPB2 residue 271 plays a key role in enhanced polymerase activity of influenza A viruses in mammalian host cellsJ. Virol.2010844395440610.1128/JVI.02642-092863787
  • HayashiTWillsSBusseyKATakimotoTIdentification of influenza A virus PB2 residues involved in enhanced polymerase activity and virus growth in mammalian cells at low temperaturesJ. Virol.2015898042804910.1128/JVI.00901-154505657
  • ChenGWGenomic signatures for avian H7N9 viruses adapting to humansPLoS ONE201611e014843210.1371/journal.pone.01484324742285
  • CauldwellAVMoncorgeOBarclayWSUnstable polymerase-nucleoprotein interaction is not responsible for avian influenza virus polymerase restriction in human cellsJ. Virol.2013871278128410.1128/JVI.02597-123554100
  • XiaoCPB2-588V promotes the mammalian adaptation of H10N8, H7N9 and H9N2 avian influenza virusesSci. Rep.2016610.1038/srep194744726052
  • ZhaoZPB2-588I enhances 2009 H1N1 pandemic influenza virus virulence by increasing viral replication and exacerbating PB2 inhibition of beta interferon expressionJ. Virol.2014882260226710.1128/JVI.03024-133911566
  • FoegleinAacuteInfluence of PB2 host-range determinants on the intranuclear mobility of the influenza A virus polymeraseJ. Gen. Virol.2011921650166110.1099/vir.0.031492-03167894
  • GabrielGThe viral polymerase mediates adaptation of an avian influenza virus to a mammalian hostProc. Natl. Acad. Sci. USA2005102185901859510.1073/pnas.05074151021317936