SOM-based corpus modeling for disambiguation purposes in MT

PROFILE

Research Area:

Other topics in Linguistics

Type:

In Proceedings

Year:	2012

Authors:	George Tambouratzis; Georgios Tsatsanifos; Ioannis Dologlou; Tsimboukakis N.

Book title:	MTW-2012- Hybrid Machine Translation, Proceedings of the Workshop on Hybrid Machine Translation, held within the TSD-2012 Conference

Pages:	pp.29-36
Address:	Brno, Czech Republic

Date:	September 3, 2012
ISBN:	978-80-263-0266-7



Abstract:	The PRESEMT project constitutes a novel approach to the machine translation (MT) task. This project aims to develop a language-independent MT system architecture that is readily portable to new language pairs. PRESEMT falls within the Corpus-based MT (CBMT) paradigm, using a small bilingual parallel corpus and a large TL monolingual corpus. The present article investigates the process of selecting the best translation for a given token, by choosing over a set of suggested translations. For this disambiguation task, a dedicated module based on the SOM model (Self-Organizing Map) is presented. Though the SOM has been studied extensively for text processing applications, the present application on translation disambiguation is novel. The actual features employed are described, which project textual data on the SOM lattice. Details are provided on the modifications required to model very large corpora and on experimental results of integrating SOM to the PRESEMT system.
[Bibtex]