Abstract
Predicting membrane protein types has a positive influence on further biological function analysis. To quickly and efficiently annotate the type of an uncharacterized membrane protein is a challenge. In this work, a system based on maximum variance projection (MVP) is proposed to improve the prediction performance of membrane protein types. The feature extraction step is based on a hybridization representation approach by fusing Position-Specific Score Matrix composition. The protein sequences are quantized in a high-dimensional space using this representation strategy. Some problems will be brought when analysing these high-dimensional feature vectors such as high computing time and high classifier complexity. To solve this issue, MVP, a novel dimensionality reduction algorithm is introduced by extracting the essential features from the high-dimensional feature space. Then, a K-nearest neighbour classifier is employed to identify the types of membrane proteins based on their reduced low-dimensional features. As a result, the jackknife and independent dataset test success rates of this model reach 86.1 and 88.4%, respectively, and suggest that the proposed approach is very promising for predicting membrane proteins types.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 60704047 and 60805001) and sponsored by Shanghai Pujiang Programme.