Bayesian Induction of Verb Sub-categorization Frames in Imbalanced Heterogeneous Data: Journal of Quantitative Linguistics: Vol 12, No 2-3

Views

CrossRef citations to date

Altmetric

Abstract

The problem of high imbalance in data in the binary classification task of determining whether a syntactic construction (environment) co-occurring with a verb in a natural text corpus consists of a subcategorization frame of the verb or not is the central focus of the present paper. Each environment is encoded as a vector of heterogeneous attributes, where a very high imbalance between positive and negative examples is observed (an imbalance ratio of approximately 1:80). In order to cope with the plethora of negative examples, we propose a search tactic during training that employs Tomek links for eliminating unnecessary negative examples from the training set. As for a classification mechanism, we argue that Bayesian networks are well suited and we propose a novel network structure which efficiently handles heterogeneous attributes without discretization and is more classification-oriented. Comparing the experimental results with those of other known machine learning algorithms, our methodology performs significantly better in detecting instances of the rare positive class.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Bayesian Induction of Verb Sub-categorization Frames in Imbalanced Heterogeneous Data

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Bayesian Induction of Verb Sub-categorization Frames in Imbalanced Heterogeneous Data

Abstract

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date