Abstract
In a lot of practical machine learning applications, such as web page classification, protein shape classification, unlabelled instances are easy to obtain, but labelled instances are rather too expensive to get. Thus, recently, semi-supervised learning (SSL) methods including graph-based algorithms have attracted many interests from researchers. However, most of these algorithms used the Gaussian function to calculate weights of the edge of the graph. In this paper, we proposed a novel weight for graph-based semi-supervised algorithms. In this new algorithm, the label information is added from problem into SSL algorithm, and the geodesic distance is utilized instead of Euclidean distance to calculate the distance between two instances. Furthermore, class prior knowledge is also added from problem into the target function. In this paper, we focus on learning with local and global consistency. We found that the effect of class prior knowledge maybe different between under low-label rate and high-label rate. Experiments on two University of California Irvine (UCI) data sets and United States Postal Service handwritten digit recognition show that our proposed algorithm is really effective.
Acknowledgements
This work was supported by the grant of the National Science Foundation of China (grant nos. 61100161, 61272333, 61005007, 61175022, 61005010 and 30900321), the grant of the Knowledge Innovation Program of the Chinese Academy of Sciences (grant nos. Y023A61121 and Y023A11292), Open Fund Project of State Key Laboratory of Software Novel Technology Nanjing University, People's Republic of China (KFKT2012B26) and Open Fund Project of State Key Laboratory of Software Engineering, Wuhan University, People's Republic of China (SKLSE2012-09-25).