349
Views
1
CrossRef citations to date
0
Altmetric
Miscellany

Preface

Bioinformatics

Pages 581-583 | Published online: 02 Jul 2007

Advances in genome sequencing, microarrays, proteomics and functional and structural genomics have been creating a huge amount of data, which allow significant learning and experimentation to be carried out using a multidisciplinary approach. Bioinformatics has already become an ideal research area for computer scientists, mathematicians and biologists to manage, analyse, and interpret functional information from biological data, sequences and structures. Sophisticated computer system theories and computing algorithms have been exploited or have emerged in the general area of computer mathematics, such as the analysis of algorithms, artificial intelligence, automata, computational complexity, computer security, concurrency and parallelism, data structures, knowledge discovery, DNA and quantum computing, randomization, semantics, symbol manipulation, numerical analysis and mathematical software, etc.

The overall aim of this Special Issue is to bridge the gap between computer mathematics and bioinformatics. We believe that computer mathematics would provide powerful tools for analysing, predicting, and understanding data from gene expression, drug design and other emerging genomic and proteomic technologies. There were more than 30 papers submitted to this Special Issue, which covered both the practical and theoretical aspects of novel computer-based mathematical theories or algorithms (e.g. classification, regression, clustering, probabilistic learning, inductive logic programming, reinforcement learning) with specific applications in bioinformatics. After a rigorous peer review process, eight papers have been selected that provide solutions, or early promises, to modelling, analysis and implementation in the general area of bioinformatics.

High-throughput gene expression profiling derived from microarray technologies provides a powerful approach for researchers to simultaneously observe expression changes of thousands of genes under a set of experimental conditions or over a series of time points. One challenge for understanding the contents of microarray data is how to extract meaningful information from the large number of gene expression data. It is believed that genes with similar patterns of expression under a variety of conditions are likely to be regulated via similar mechanisms, and there is a great need to further improve analytical methodologies that are currently used to identify co-expressed genes.

In the paper “A novel pattern based clustering methodology for time-series microarray data” by Phan and Famili et al., a novel pattern recognition method is proposed to select co-expressed genes based on the rate of change and modulation status of gene expression at each time interval. The proposed method is capable of identifying gene clusters consisting of highly similar shapes of expression profiles and modulation patterns. A quality index is also developed based on the semantic similarity in gene annotations to assess the likelihood of a cluster being a co-regulated group.

In another paper “Protein sequence analysis using relational soft clustering algorithms” by Maji and Pal, two relational soft clustering algorithms are presented to select the most informative bio-bases. The concept of “degree of resemblance”, based on non-gapped pair-wise homology alignment score, circumvents the initialization and local minima problems of both c-medoids algorithms, which enables efficient selection of a minimum set of most informative bio-bases. Some indices are introduced for evaluating quantitatively the quality of selected bio-bases. The effectiveness of the algorithms, along with a comparison with other algorithms, is demonstrated on HIV (human immunodeficiency virus) protein data sets.

In the third paper “Robust filtering for gene expression time series data with variance constraints” by Wei and Wang et al., an uncertain discrete-time stochastic system is employed to represent the model for gene regulatory networks from time series data. A robust variance-constrained filtering problem is investigated for a gene expression model with stochastic disturbances and norm-bounded parameter uncertainties, where the stochastic perturbation is in the form of a scalar Gaussian white noise with constant variance and the parameter uncertainties enter both the system matrix and the output matrix. By using the linear matrix inequality (LMI) technique, sufficient conditions are first derived for ensuring the desired filtering performance for the gene expression model. Then, the filter gain is characterized in terms of the solution to a set of LMIs, which can be easily solved by using available software packages. A simulation example is exploited for a gene expression model in order to demonstrate the effectiveness of the proposed design procedures.

Molecular biologists are often interested in getting a survey of the objects in a biomolecular database making classification one of their basic tasks: to which of the recognized classes in the database does a new molecule belong? Answering this question is a basic problem in structural bioinformatics. One of the difficulties of the task is that the results need the approval by the biological community since the quality of the classification algorithm is very difficult to measure.

In the paper “Invariant features for searching in protein fold databases” by Temerinac, Reisert and Burkhardt, a new algebraic method is proposed for structural comparison between proteins based on invariant features computed by group integration with spherical harmonics and D-Wigner matrices. Good classification is achieved without alignment by using intrinsic, pose invariant features. The method is compared to existing algorithms such as DALI, PRIDE and the Gauss Integral-method in a classification and search task, and a web interface is provided to test the proposed method.

In the paper “Forward selection method with regression analysis for optimal gene selection in cancer classification” by Park, Yoo and Cho, a new gene selection method is proposed based on forward selection method with regression analysis in order to find the informative genes to predict cancer. The genes selected by this method tend to have information about the cancer that is not overlapped by the other genes selected. The sensitivity, specificity, and recognition rate of the selected genes are measured with k-nearest neighbour classifier for colon cancer dataset and lymphoma cancer dataset.

In the paper “Microarray sub-grid detection: a novel algorithm” by Morris, Wang and Liu, a novel algorithm for detecting microarray sub-grids is proposed, where the only input to the proposed algorithm is the raw microarray image, which can be at any resolution, and the sub-grid detection is performed with no prior assumptions. The algorithm consists of a series of methods of spot shape detection, spot filtering, spot spacing estimation, and sub-grid shape detection. The algorithm is shown to be successful in dividing images of varying quality into sub-grid regions with no manual interaction. The algorithm is robust against high levels of noise, and high percentages of poorly expressed or missing spots.

Most cellular processes are believed to be carried out by groups of highly interacting proteins called functional modules, protein complexes, or molecular complexes. Recent large-scale high-throughput experiments, and integration of published data, have generated large protein-protein interaction (PPI) networks. In these networks, the nodes represent proteins, and the edges are interacting pairs of proteins. Protein complexes can be detected by identifying highly connected sets of proteins in the PPI networks. Computational identification of functional modules or protein complexes can provide an inexpensive guideline for biological experiments.

In the paper “Multilevel approaches for large-scale proteomic networks” by Oliveira and Seok, triangular clique (TC) based multilevel algorithms and further maximal clique merging multilevel algorithms are developed. A 2-core network of a proteomic network is constructed by removing all nodes which have degree less than two recursively. Qualities of super-nodes with the clique merging multilevel algorithms are better even though time complexities are much bigger than other matching based algorithms because all maximal cliques should be enumerated in advance. All maximal cliques are enumerated fairly quickly on scale-free networks. Most of the time is spent at the refining step, especially when dealing with many clusters.

In another paper “Comparison study on two kernel-based learning algorithms for predicting the distance range between antibody interface residues and antigen surface” by Shi and Wan et al., a kernel-based machine algorithm called Multiple Criteria Quadratic Programming (MCQP) is developed to predict the distance range between antibody interface residues and antigen surface in antigen-antibody complex. The interaction is established between antibody interface residues and antigen in studying antibody functions. The prediction results of the proposed MCQP are compared with the Support Vector Machine (SVM) algorithm, and it is shown that the MCQP algorithm classifies observations into distinct groups via a hyper-plane based on multiple criteria.

This Special Issue is a timely reflection of the research progress in the area of algorithm-based biological data analysis that helps bridge the gap between computer mathematics and bioinformatics. Finally, we would like to acknowledge all authors for their efforts in submitting high-quality papers. We are also very grateful to the reviewers for their thorough and on-time reviews of the papers. Last, but not least, our deepest gratitude goes to Professor E. H. Twizell (Editor-in-Chief) of International Journal of Computer Mathematics for his consideration, encouragement, and advice to publish this Special Issue.

March 2007

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.