Project - ACCURAT: Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation

PROFILE

ACCURAT: Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation

Start date:

01-01-2010

End date:

30-06-2012

Funded by:

ICT (FP7)

Project leader:

Nicholas Glaros

Website:

http://www.accurat-project.eu/

ACCURAT project is aimed at researching methods and techniques to overcome one of the central barriers in Machine Translation (MT), namely the lack of large-scale linguistic resources (i.e., parallel corpora) for under-resourced languages and/or narrow domains. The project will research and evaluate novel methods that exploit comparable corpora in order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality.

ACCURAT will provide researchers and developers with a methodology and fully functional model for exploiting comparable corpora in MT, including

methods for automatic acquisition of a comparable corpus from the Web and other sources;
comparability metrics, i.e., criteria to measure the comparability of source and target language documents in comparable corpora;
methods for alignment and extraction of lexical, terminological and other linguistic data from comparable corpora;
measurement of the improvements from applying acquired data against baseline results from SMT and RBMT systems.

Research areas

Multilingual Content Processing

Publications

Project Partners:

ACCURAT: Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation

Quick links