SOM-based corpus modeling for disambiguation purposes in MT

ΕΡΕΥΝΑ

Ερευνητική περιοχή:

Άλλα Γλωσσολογικά Θέματα

Είδος:

Άρθρο σε πρακτικά

Έτος:	2012

Συγγραφείς:	Γιώργος Ταμπουρατζής; Γεώργιος Τσατσανίφος; Ιωάννης Δολόγλου; Tsimboukakis N.

Τίτλος βιβλίου:	MTW-2012- Hybrid Machine Translation, Proceedings of the Workshop on Hybrid Machine Translation, held within the TSD-2012 Conference

Σελίδες:	pp.29-36
Διεύθυνση:	Brno, Czech Republic

Ημερομηνία:	September 3, 2012
ISBN:	978-80-263-0266-7



Περίληψη:	The PRESEMT project constitutes a novel approach to the machine translation (MT) task. This project aims to develop a language-independent MT system architecture that is readily portable to new language pairs. PRESEMT falls within the Corpus-based MT (CBMT) paradigm, using a small bilingual parallel corpus and a large TL monolingual corpus. The present article investigates the process of selecting the best translation for a given token, by choosing over a set of suggested translations. For this disambiguation task, a dedicated module based on the SOM model (Self-Organizing Map) is presented. Though the SOM has been studied extensively for text processing applications, the present application on translation disambiguation is novel. The actual features employed are described, which project textual data on the SOM lattice. Details are provided on the modifications required to model very large corpora and on experimental results of integrating SOM to the PRESEMT system.
[Bibtex]