Abstract
The application of memory-based learning on a balanced Modern Greek corpus for the detection of the boundaries and the types of main and secondary clauses is presented in this paper. For the detection of clause boundaries every token is considered to be a candidate boundary. Learning instances are formed by taking into account very basic linguistic properties of the candidate boundary as well as of tokens preceding and following it. For the training instances, tokens were manually tagged depending on whether they constituted the beginning, the end, the inside of a clause or a one-word clause. For the recognition of clause types, clauses were tagged with one of the 12 types of Modern Greek clauses (one type for main clauses, and 11 types for secondary clauses). Learning was performed in two levels, using the classification results of the first level for a second training process, which helps the learning program to learn from its own mistakes. The minimal information required for input allows for the easy portability of the methodology to other languages, unlike previous approaches that make use of language-dependent empirical rules.