Research Area - Multilingual Content Processing

Multilingual Content Processing


Multilingualism, the cornerstone of our multicultural society, asks for affordable technologies and applications that enable communication and collaboration across languages, secure language users equal access to the information society, and support each language in the advanced functionalities of networked ICT. For these applications to work, at least two strands of technological development are pursued:

  • Machine Translation (ΜΤ) and
  • Technologies for information and knowledge management including cross-lingual information retrieval (CLIR).

Multilingual technologies are not be viewed in isolation from other language technology tracks (e.g. document and content production and management, authoring tools, etc.), or other communication media and modalities (e.g. speech and speech interfaces, multimodal user interfaces, etc.).

Accomplishments so far:

  • automatic multilingual subtitling
  • combination of lexical approaches (bilingual lexical substitution) with monolingual corpus-based modelling
  • technical document translation
  • large parallel corpora processing and exploitation for bilingual lexicon acquisition
  • optimal, context-based lexical transfer selection
  • grammatical modelling in the target language to enhance fluency of the output
  • Tr·AID: Translation Memory

Current R&D focus:

  • automatic methods for discovery and annotation of parallel corpora for translation modelling
  • automatic methods for discovery of comparable corpora and metrics for comparability measurement, in an attempt to relax the corpus parallelness constraint
  • methods for automatic domain classification of source texts and translation engine adaptation
  • further advancements in the corpus-based machine translation paradigm by introduction of a limited-size parallel corpus and a wide repertoire of pattern recognition and artificial intelligence techniques for linguistic applications ranging from the alignment of sentences and the creation of compatible phrases in different languages to the optimisation of system parameters

The combination of the above lines is expected to lead to substantial progress in terms of translation quality and speed, domain adaptability, and language portability and ease of development of new language pairs.