RESEARCH
N-grams: A Tool for Repairing Word Order Errors in ill-formed Texts
Year: | 2006 | ||||
---|---|---|---|---|---|
Authors: | Theologos Athanaselis; Stylianos Bakamidis; Ioannis Dologlou; K. Mamouras | ||||
Journal: | International Journal of Signal Processing | ||||
Volume: | 3 | ||||
Number: | 2 | ||||
Pages: | 123-128 | ||||
Abstract: | This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method |
||||
[Bibtex] |