89
Views
5
CrossRef citations to date
0
Altmetric
Original Research

Inferences on the biochemical and environmental regulation of universal stress proteins from Schistosomiasis parasites

, , &
Pages 15-27 | Published online: 10 May 2013

Figures & data

Figure 1 Overview of a set of bioinformatics and visual analytics methods used to prioritize protein sequences for further research.

Notes: The core of the prioritization process is a visual analytics stage (Stage 4) that enables the interaction of researcher(s) with the results from the bioinformatics analyses (Stage 2 and Stage 3) of the protein sequences (Stage 1). The evolutionary relatedness of the protein sequences is based on statistically supported groups in a phylogenetic tree that is derived, in turn, from the multiple sequence alignment of all the sequences (Stage 3). Additional evidence for evolutionary relatedness is obtained from the gene synteny on the chromosomal regions. The protocol can be particularly suited for identifying orthologous proteins with shared patterns of sequence and functional annotations (Stage 5). In the context of schistosomiasis parasites from different regions of the world, the identified proteins could be targets for understanding shared biological processes during the life cycle of parasites. Details of each method are available in the Methods section of the article.
Figure 1 Overview of a set of bioinformatics and visual analytics methods used to prioritize protein sequences for further research.

Table 1 Annotation features for universal stress proteins of Schistosoma mansoni and Schistosoma japonicum

Figure 2 Grouping of 13 Schistosoma USPs by sequence length.

Notes: The image provides a visual comparison of the protein sequence and USP domain sequence lengths for 13 Schistosoma USPs. A visual analytics resource that can be used to interact with the data is available at http://public.tableausoftware.com/views/schisto_features_usp/groupbylength. Sequences of Schistosoma mansoni have “Smp” in the sequence identifier.
Abbreviations: aa, amino acid; UniProt, Universal Protein Resource (Apweiler et al);Citation47 USP, universal stress protein.
Figure 2 Grouping of 13 Schistosoma USPs by sequence length.

Figure 3 Grouping of 13 Schistosoma universal stress proteins by functional site signature.

Notes: The functional site signature is constructed by joining the twelve ligand binding sites known for the ATP-binding USP from Methanocaldococcus jannaschii (UniProt [Apweiler et al]Citation47 ID: Y577_METJA). The image provides a visual comparison of the functional site signatures for 13 Schistosoma USPs. A visual analytics resource that can be used for interacting with the data is available at http://public.tableausoftware.com/views/schisto_features_usp/groupbylength. Sequences of Schistosoma mansoni have “Smp” in the sequence identifier.
Abbreviations: ATP, adenosine triphosphate; UniProt, Universal Protein Resource; USP, universal stress protein.
Figure 3 Grouping of 13 Schistosoma universal stress proteins by functional site signature.

Figure 4 Multiple sequence alignment of the sequences of selected universal stress proteins of Schistosoma mansoni and Schistosoma japonicum.

Notes: The sequence alignment of the 13 sequences with ATP-binding motif [G2XG9XG(S/T)] was generated using ClustalW (Larkin et al).Citation56 Sequences of S. mansoni have “Smp” in the sequence identifier. The ligand binding sites (functional sites), annotated in the Conserved Domain Database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) are labeled with hashes (#). An observation is that aspartate, leucine, glycine, histidine, and proline residues are conserved in all the sequences (denoted by ^). The conserved positions are 57, 101, 127, 166, and 176 in Smp_076400 from S. mansoni. The conserved residues could be common functional sites for biochemical or environmental regulation of Schistosoma universal stress proteins. Meaning of alignment symbols: “*”, residues in column are identical; “:”, conserved substitutions; “.”, semiconserved substitutions. A visual analytics resource that can be used to interact with data is available at http://public.tableausoftware.com/views/schisto_features_usp/usp_align.
Abbreviations: ATP, adenosine triphosphate; D, aspartate; G, glycine; H, histidine; L, leucine; P, proline; Uniprot, Universal Protein Resource; USP, universal stress protein.
Figure 4 Multiple sequence alignment of the sequences of selected universal stress proteins of Schistosoma mansoni and Schistosoma japonicum.

Figure 5 Grouping of 13 Schistosoma universal stress protein sequences. The phylogenetic tree was generated with MEGA5 (Tamura et al),Citation59 using the maximum likelihood method. The 13 Schistosoma universal stress protein sequences were clustered in five groups (AE). The numbers near the clades are the statistics from the 1000 bootstrap that support the phylogeny recovery of the clades. A visual analytics resource that can be used to view the image, with other associated data, is available at http://public.tableausoftware.com/views/schisto_features_usp/phylotrees. Sequences of S. mansoni have “Smp” in the sequence identifier.

Figure 5 Grouping of 13 Schistosoma universal stress protein sequences. The phylogenetic tree was generated with MEGA5 (Tamura et al),Citation59 using the maximum likelihood method. The 13 Schistosoma universal stress protein sequences were clustered in five groups (A–E). The numbers near the clades are the statistics from the 1000 bootstrap that support the phylogeny recovery of the clades. A visual analytics resource that can be used to view the image, with other associated data, is available at http://public.tableausoftware.com/views/schisto_features_usp/phylotrees. Sequences of S. mansoni have “Smp” in the sequence identifier.

Figure 6 Parsimony test for the phylogeny tree reconstructed for Schistosoma universal stress proteins with the maximum likelihood method.

Notes: Maximum parsimony analysis of each of the 1000 bootstrap replications, from maximum likelihood, () determined the percentage of the bootstrap replications in which a particular clade (a node and all of its descendent taxa) was recovered. Those clades, which were recovered close to 100% of the bootstrap replications, indicate confident and statistical support in our analysis. A visual analytics resource that can be used to view the image, with other associated data, is available at http://public.tableausoftware.com/views/schisto_features_usp/phylotrees. Sequences of S. mansoni have “Smp” in the sequence identifier.
Figure 6 Parsimony test for the phylogeny tree reconstructed for Schistosoma universal stress proteins with the maximum likelihood method.

Figure 7 Integration and visualization of the data on the sequence features, evolutionary relatedness, and developmental expression of Schistosoma universal stress proteins (Q86DW2 and G4LZI3).

Notes: The integration and visualization design was implemented in the visual analytics software environment (Tableau Software Inc, Seattle, WA, USA). Among the 13 sequences compared, the two 184-amino-acid-long sequences Q86DW2 (Sjp_0058490) and G4LZI3 (Smp_076400) were prioritized for further research. The decision was based on statistical support from the phylogenetic analysis as well as the relatively complete and consistent annotations in the protein sequence length, biologically relevant chemical ligands, and ligand-binding amino acids (amino acid type and amino acid position). A visual analytics resource that can be used to interact with the view is available at http://public.tableausoftware.com/views/schisto_features_usp/phylo_group. Sequences of S. mansoni have “Smp” in the sequence identifier.
Abbreviations: ADP, adenosine diphosphate; AMP, adenosine monophosphate; ATP, adenosine triphosphate; CA, calcium; D, aspartate; G, glycine; GTP, guanosine triphosphate; I, isoleucine; Mg, magnesium; PKC, protein kinase C; P, proline; R, arginine; S, serine; UniProt, Universal Protein Resource (Apweiler et al);Citation47 V, valine; Zn, Zinc.
Figure 7 Integration and visualization of the data on the sequence features, evolutionary relatedness, and developmental expression of Schistosoma universal stress proteins (Q86DW2 and G4LZI3).

Figure 8 Integration and visualization of the data on the sequence features, evolutionary relatedness, and developmental expression of Schistosoma universal stress proteins (Q86DX1 and C1M0Q2).

Notes: This figure illustrates the decision-making process. In comparison with Q86DW2 and G4LZI3 (), the annotations for the protein sequence length, biologically relevant chemical ligands, and ligand-binding amino acids (type and position) were not identical. A visual analytics resource that can be used to interact with the view is available at http://public.tableausoftware.com/views/schisto_features_usp/phylo_group. Sequences of S. mansoni have “Smp” in the sequence identifier.
Abbreviations: ADP, adenosine diphosphate; AMP, adenosine monophosphate; ATP, adenosine triphosphate; D, aspartate; G, glycine; GTP, guanosine triphosphate; I, isoleucine; M, methionine; Mg, magnesium; PKA, protein kinase A; PKC, protein kinase C; P, proline; R, arginine; S, serine; T, threonine; UniProt, Universal Protein Resource (Apweiler et al);Citation47 V, valine; Zn, Zinc.
Figure 8 Integration and visualization of the data on the sequence features, evolutionary relatedness, and developmental expression of Schistosoma universal stress proteins (Q86DX1 and C1M0Q2).

Figure 9 Design layout and visualization of data sets from the sequence analysis, evolutionary relatedness, and developmental expression of 13 Schistosoma universal stress proteins.

Notes: The details of the annotation features are available in the Methods section. The views constructed and data are available for download from an Internet website: http://public.tableausoftware.com/views/schisto_features_usp/integrated_view. The free software Tableau Reader (http://www.tableausoftware.com/products/reader) (Tableau Software Inc) can be used for offline access to the downloaded views and data.
Abbreviations: ADP, adenosine diphosphate; AMP, adenosine monophosphate; ATP, adenosine triphosphate; CA, calcium; D, aspartate; G, glycine; GTP, guanosine triphosphate; I, isoleucine; M, methionine; Mg, Magnesium; PKA, protein kinase A; PKC, protein kinase C; P, proline; R, arginine; S, serine; T, threonine; UniProt, Universal Protein Resource (Apweiler et al);Citation47 V, valine; Zn, Zinc.
Figure 9 Design layout and visualization of data sets from the sequence analysis, evolutionary relatedness, and developmental expression of 13 Schistosoma universal stress proteins.