Δημοσίευση - Accurate phrase alignment in a bilingual corpus for EBMT systems
ΕΡΓΑ

Accurate phrase alignment in a bilingual corpus for EBMT systems

Ερευνητική περιοχή:  
    
Είδος:  
Άρθρο σε πρακτικά

 

Έτος: 2012
Συγγραφείς: Γιώργος Ταμπουρατζής; Μιχαήλ Τρουλλινός; Μαρίνα Βασιλείου; Σωκράτης Σοφιανόπουλος
Επιμέλεια: Reinhard Rapp; Marko Tadić; Serge Sharoff; Pierre Zweigenbaum
Τίτλος βιβλίου: Proceedings of the 5th Workshop on Building and Using Comparable Corpora at LREC 2012
Σελίδες: 104-111
Διεύθυνση: Istanbul, Turkey
Οργανισμός: ELRA, ELDA
Περίληψη:
An ongoing trend in the creation of Machine Translation (MT) systems concerns the automatic extraction of information from large bilingual parallel corpora. As these corpora are expensive to create, the largest possible amount of information needs to be extracted in a consistent manner. The present article introduces a phrase alignment methodology for transferring structural information between languages using only a limited-size parallel corpus. This is used as a first processing stage to support a phrase-based MT system that can be readily ported to new language pairs. The essential language resources used in this MT system include a large monolingual corpus and a small parallel one. An analysis of different alignment cases is provided and the solutions chosen are described. In addition, the application of the system to different language pairs is reported and the results obtained are compared across language pairs to investigate the language-independent aspect of the proposed approach.
[Bibtex]