311
Views
0
CrossRef citations to date
0
Altmetric
Articles

A Model-based Feature Optimization Approach to Chinese Language Processing

, , &
Pages 55-81 | Published online: 28 Oct 2014
 

Abstract

Many approaches to automatic classification begin with some prescribed features. However, the features for Chinese aspect classification are normally prescribed as several integrated linguistic feature sets involving temporal, lexical aspectual or grammatical features. The number of the features is often gradually expanded as the designers try to refine the conditions for classification until at last the features should be optimized to eliminate some of the useless or contradictory features. The features for Chinese aspect classification are difficult to be optimized as they are discrete, quite different from those in other classifications. A model-based approach is proposed in this study to optimize the features for Chinese aspect classification illustrated by ZHE aspect markers by estimating, processing and testing the correlations between the features. As an important preparation for building the model, dummy variables are firstly adopted in this study to represent the discrete Chinese ZHE aspect features. The correlations among the features are then estimated by contingency tables. The highly correlated variables are further combined using the Principal Component Analysis. The performances of the original and the optimized features are finally empirically verified by logistic models. The optimized 26 feature sets from the original 40 feature sets are tested with better performances after comparisons before and after the optimizations. Model-based feature selection approaches extensively used in economics have rarely been applied in NLP for Chinese up until now. It will shed some new light on the NLP feature selection method and have some implications in generating rules for revising the Chinese ZHE aspects to its target English categories before being automatically translated into English categories.

Notes

1 This Project is supported by the National Social Science Foundation of China (No. 08BYY001).

2 This six target English categories of Chinese aspect were built by Qu (Citation2008) and Qu et al. (Citation2010), based on the aspect theories of Comrie (Citation1976), Smith (Citation1997), Olsen (Citation1997), Chen (Citation2003, Citation2008), and Xiao and McEnery (Citation2004).

3 Simple tense with imperfective implication is the target English category of ZHE aspect, quoted from the classifications made by Olsen (Citation1997).

4 “Others” refers to all the non-English aspect constructions, like noun phrases, adjective phrases, prepositional phrases and etc.

5 Simple tense with perfective implication is the target English category of ZHE aspect, quoted from the classifications made by Olsen (Citation1997).

6 “Imperfective” refers to the English simple tense with imperfective implications based on the classifications made by Olsen (Citation1997).

7 “Perfective”’ refers to the English simple tense with perfective implications based on the classifications made by Olsen (Citation1997).

8 The results of Y2Y6 are not listed in this paper due to the limitation of the paper length.

9 The results of Y2Y6 are omitted in this paper due to the limitations of the paper length.

10 There are all together 200 ZHE aspect features grouped as 40 sets. <DUOBJECT-PRCS>, <PRCS-MAINV>, <PRCS-SUBF>, <PRCS-SUBP> are the sub-features of X24 set.<STAT_MAINV>,<STAT_SUBF>,<STAT_SUBP> are the sub-features of X26 set.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.