115
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

A Table Look-Up Parser in Online ILTS Applications

, &
Pages 49-62 | Published online: 16 Feb 2007
 

Abstract

A simple table look-up parser (TLUP) has been developed for parsing and consequently diagnosing syntactic errors in semi-free formatted learners' input sentences of an intelligent language tutoring system (ILTS). The TLUP finds a parse tree for a correct version of an input sentence, diagnoses syntactic errors of the learner by tracing and regarding the deviations of the input from the identified correct sentences as errors of the learner. The TLUP can now display the parse tree as well as diagnosed errors at leaves' level. This simple super-rule based TLUP turns out to be a powerful pedagogic tool coaching L2 (second language) learners on the grammar structures in L2, saving the computing time and running space requirements, improving the parsing accuracy and allowing the real time online running environments to be applicable. The TLUP is effective as long as the input does not deviate too much from anticipated embedded patterns of model answers with committed errors remaining within the framework of anticipation for the level of the ITLS designed. We establish the validity of a TLUP by an (n, c) lot acceptance – sampling plan.

Notes

For example, it is known that, to generate a grammar tree for a sentence with n terms, a general purpose probabilistic grammar parser requires O(n 3) time in the sense of complexity analysis.

See Tokuda and Chen (2001) for the definition of similarity.

At http://web.unbc.ca/∼chenl/translation, we can find that a total of 37944 different correct sentences (i.e., correct English translations) can be taken as the correct answer to the translation problem of a Japanese sentence meaning “Marine resources are available in abundance in Malaysia”.

In the formula of Sekine and Grishman (1995), a square of the term Ptagj|wordj is used.

A simple example of the coordination problem: safety in (trucks and minivans); (safety in trucks) and minivans. Simply adding parentheses is enough to clarify the range of applicability of disambiguation.

To be precise, the total number is 37,944; see also footnote 3. It is important to note that the developers actually did not collect/list all 37,944 sentences for this question at the time of template construction. Rather, it is the template structure that allows us to collect the sentences from every possible combination of paths embedded in template. This means that we are able to extract as many as 37,944 sentences from the template, after the template is constructed by language experts using a much smaller number of samples. This is the advantage of using template. The reader is referred to Chen and Tokuda (2003) and Tokuda and Chen (2001) for examples of templates.

If one word appears several times in one template, we regard them as separate words.

Although it is an assumption and we should verify when our program is assigning part-of-speech tags to words in the templates, presently we find no exceptions. If there is such an exception, we should be able to divide the template into two or more templates so that the above assumption is satisfied.

It is understandable that, even if a parser has only an accuracy of 75%, which is not a high standard for most of the parsers available, when it is used to parses three sentences, the possibility of getting wrong parsing with all of the three sentences will be an extremely low 2.5%, which should be acceptable for teaching purposes. We can see that, when a parser gets a wrong tagging to a sentence, it is very likely to set wrong POST tags (extended POST tags) to many words in a sentence; this indicates that when several sentences share the same word in a template, the chance that a parser makes wrong parsing without being caught is very low.

When the POST array of one word in a sentence is corrected manually or by “voting”, we should parse the whole sentence again to ensure the correctness of the tags of other words in the sentence.

If we regard the parser as a producer and the user as a consumer of the parser, p0 actually denotes an Acceptable Quality Level, representing the base line requirement for the quality of the producer's product, which is a parse tree of a set of correlated sentences, while p1 is the Lot Tolerance Percent Defective, a designated high defect level that would be unacceptable to the consumer.

Possible “help” includes the manual assignment of part-of-speech tags to certain words or phrases, reparsing of all the correct sentences, and manual adjustment of the tree structures.

We can use a general-purpose parser to do both of the jobs or use a tagger for tagging purposes and a parser for the rest.

The assumption in the section ‘Assigning part-of-speech tags and disambiguation bits to error-free nodes in the templates” guarantees that no conflict will occur at this stage. If it does occur, we should come back to use more disambiguation bits in the parser.

For the templates we used in Azalea, we use Brill's tagger (CitationBrill, 1994) to generate POST tag arrays, and we use ApplePie (CitationSekine and Grishman, 1995) to get the parse trees in the above look-up table construction process.

  • Readers interested in theoretical analysis will be able to prove the following theorem:

    • Theorem: If the length of a matched correct sentence is n, and the size of part-of-speech tag set is m, we are able to find the corresponding grammar tree within O(n log2 m) time.

  • As a matter of fact, the part-of-speech tag set is always quite small, and thus the time complexity for finding a grammar tree can be regarded as O(n). Since the lengths of a matched correct sentence and the input sentence are of the same order, the complexity of generating the grammar tree is of O(n). This shows a marked improvement over the complexity of a general parser, which is O(n3).

The tagger can be downloaded from http://www.cs.jhu.edu/∼brill/.

At http://web.unbc.ca/∼chenl/translation, the file all-tranalations.txt contains all 37,944 different correct sentences (i.e., correct English translations) as correct answers for the translation problem of a Japanese sentence meaning “Marine resources are available in abundance in Malaysia”. The file all-different.txt contains all 1,614 sentences that have different part-of-speech arrays. The part-of-speech array of any of other 36,330 sentences must reduce to that of one of these 1,614 sentences.

We can reduce the number of the sentences even further if we take advantage of the common structures of sentences prevailing among different templates.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 339.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.