59
Views
0
CrossRef citations to date
0
Altmetric
Articles

A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

, , , , &
Pages 94-101 | Received 14 Aug 2015, Accepted 28 Feb 2016, Published online: 23 Mar 2016
 

Abstract

The auto authorship recognition has become a novel technique to investigate cybercrimes. But the challenge of the research is that a huge number of features exist in the moderate-sized corpus, which causes the curse of over-training. Besides, it is hard to distinguish between potential authors only by a single feature set. In this paper, we proposed a random sampling style ensemble method with individual-author feature selection to exploit the high-dimensional feature space. The proposed method randomly picks writing-style features on each individual-author feature set (IAFS) partitioned from the whole feature set. The IAFSs are heuristically selected with training set of each author. Then, multiple base classifiers (BCs) are formed on the sampled feature sets. Finally, all BCs are fused to get a final decision. Experimental results on the real-life Chinese forum data verify the robustness of the proposed method compared with conventional ensemble methods. We also analyze the diversity of algorithm to reveal that the ensemble strategy is more effective and can construct more diverse BCs than random subspace methods.

Acknowledgments

The authors sincerely thank anonymous reviewers for their constructive comments, which helped improve this paper.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.