Autolexis: Automatic construction of bilingual lexica

Autolexis: Automatic construction of bilingual lexica

Category: Services

Nowadays many conventional dictionaries exist in a machine readable form. But when it comes to computerised applications, usually it turns out that the content of these "electronic dictionaries" can not be easily available in some automatic manner to text processing tools or computer-aided translation systems, like translation memories or multilingual information retrieval systems, for example. Moreover, conventional machine readable dictionaries tend to cover the general language, thus not lending themselves to particular domain applications. What frequently proves necessary is the availability of special lexica with good coverage of the specific domain or application.

Building such lexica by hand is a tedious, time-consuming and usually very expensive exercise. It requires not only the availability of bilingual lexicographic expertise, but also substantial familiarity with the particular domain (for example, medicine, law or engineering), since in order to provide a functional lexicon, one has to solve many problems, among which two of the most challenging are:

  • Multiple meanings of words
  • Assessment of correctness of translation

Autolexis provides a tool for the automatic construction of a translation lexicon from parallel, that is already translated, texts. The tool uses techniques independent of the languages in which the parallel texts have been written, thus making its use possible for texts in any language, and, of course, in any subject field.

Autolexis identifies translational equivalences at a word or a multi word level. It operates on parallel texts that have been previously aligned at sentence level and have then undergone a shallow linguistic analysis.

Autolexis has been used for developing lexica an English - Greek lexicon for the area of software systems, and it showed it is able to identify the equivalences not only of "search = αναζήτηση", but also of "database file = αρχείο βάσης δεδομένων" and even "desktop = επιφάνεια χώρου εργασίας".

Autolexis is now offered on a service basis by the Institute for Language and Speech processing.


Contact person: Stelios Piperidis


Research departments

Research areas