Abstract
A novel representation of proteins was introduced. It is independent of arbitrary decisions with respect to the choice of labels to be assigned to the 20 natural amino acids. The approach is based on an assignment of 20 unit vectors in 20-dimensional vector space to the 20 natural amino acids. Proteins are then represented by a walk, that is, a sequence of steps in the 20-dimensional space analogous to a walk in the (x, y) plane in the case of binary strings. A straightforward numerical characterization of proteins is obtained from the distance matrix associated with the walk representing the protein in 20-dimensional space combining the information on the Euclidean distance between various amino acids in protein sequence. The Line Distance matrix offers additional numerical characterization of proteins, while the lengths of steps of the walk in 20-D space allow construction of a “protein profile,” which represents distribution of average lengths of the steps and their powers.
†Visitor, Emeritus, Department of Mathematics and Computer Science, Drake University, Des Moines, IA, USA.
Acknowledgment
This work was supported in part by the Ministry of Higher Education, Science and Technology of the Republic of Slovenia through the Project P1-017: Modeling of relationship between chemical structure and properties - QSAR -QSPR
Notes
†Visitor, Emeritus, Department of Mathematics and Computer Science, Drake University, Des Moines, IA, USA.