ABSTRACT
The result of feature selection for software birthmark has a direct bearing on software recognition rate. In this paper, we apply constrained clustering to analyze the software features (SF). The within-class (homogeneous software) and between-class (heterogeneous software) distances of features are measured based on mutual information. Information gain functions and penalty functions are constructed using homogeneous and heterogeneous SF, respectively; and redundancy is measured with correlation coefficients. Then the software birthmark features with high class distinction and minimum redundancy are selected. The example of extracting and detecting framework of birthmark feature is also given. The algorithm is analyzed and compared with the similar algorithms, and it is shown the algorithm provide an effective approach for software birthmark selection and optimization.
Acknowledgements
We would like to thank everybody who have helped us throughout the paper and evaluation of the system.
Disclosure statement
No potential conflict of interest was reported by the authors.