A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

Zhi LiuNational Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China

Sanya LiuNational Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. ChinaCorrespondence[email protected]

Lin LiuNational Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China

Meng WangNational Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China

Jianwen SunNational Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China

Xian PengNational Engineering Research Center for E-Learning, Central China Normal University, Wuhan, P.R. China

Abstract

The auto authorship recognition has become a novel technique to investigate cybercrimes. But the challenge of the research is that a huge number of features exist in the moderate-sized corpus, which causes the curse of over-training. Besides, it is hard to distinguish between potential authors only by a single feature set. In this paper, we proposed a random sampling style ensemble method with individual-author feature selection to exploit the high-dimensional feature space. The proposed method randomly picks writing-style features on each individual-author feature set (IAFS) partitioned from the whole feature set. The IAFSs are heuristically selected with training set of each author. Then, multiple base classifiers (BCs) are formed on the sampled feature sets. Finally, all BCs are fused to get a final decision. Experimental results on the real-life Chinese forum data verify the robustness of the proposed method compared with conventional ensemble methods. We also analyze the diversity of algorithm to reveal that the ensemble strategy is more effective and can construct more diverse BCs than random subspace methods.

Keywords:

Acknowledgments

The authors sincerely thank anonymous reviewers for their constructive comments, which helped improve this paper.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

Information for

Open access

Opportunities

Help and information

A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

Abstract

Acknowledgments

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature