PROFILE
SOM-based corpus modeling for disambiguation purposes in MT
Research Area:  
Other topics in Linguistics
Type:  
In Proceedings
Year: | 2012 | ||||
---|---|---|---|---|---|
Authors: | George Tambouratzis; Georgios Tsatsanifos; Ioannis Dologlou; Tsimboukakis N. | ||||
Book title: | MTW-2012- Hybrid Machine Translation, Proceedings of the Workshop on Hybrid Machine Translation, held within the TSD-2012 Conference | ||||
Pages: | pp.29-36 | ||||
Address: | Brno, Czech Republic | ||||
Date: | September 3, 2012 | ||||
ISBN: | 978-80-263-0266-7 | ||||
Abstract: | The PRESEMT project constitutes a novel approach to the machine translation (MT) task. This project aims to develop a language-independent MT system architecture that is readily portable to new language pairs. PRESEMT falls within the Corpus-based MT (CBMT) paradigm, using a small bilingual parallel corpus and a large TL monolingual corpus. The present article investigates the process of selecting the best translation for a given token, by choosing over a set of suggested translations. For this disambiguation task, a dedicated module based on the SOM model (Self-Organizing Map) is presented. Though the SOM has been studied extensively for text processing applications, the present application on translation disambiguation is novel. The actual features employed are described, which project textual data on the SOM lattice. Details are provided on the modifications required to model very large corpora and on experimental results of integrating SOM to the PRESEMT system. |
||||
[Bibtex] |