Computational morphological and syntactic lexicon of Modern Greek
RESEARCH OUTPUTS

Computational morphological and syntactic lexicon of Modern Greek

Category: Language resources

The Computational morphological and syntactic lexicon of Modern Greek, which has been developed by ILSP / R.C. "Athena" in the framework of the LE-PAROLE project, can be used in Human Language Technology applications.

It consists of 20,149 lemmas containing morphological and syntactic information, according to the PAROLE model, which has been based on international linguistic standards. This project caters for the compilation of lexicons for 12 European languages (Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, Swedish). The lexicons are in SGML format, following a common DTD for all languages.

Lexicon contents
The selection of the 20,000 lemmas has been based on a hybrid approach:

  • statistical processing of a corpus of ca. 9,000,000 words in order to identify the most frequent lemmas,
  • processing of the list of the most frequent lemmas according to linguistic criteria.

More specifically, the lexicon includes

  • 20,149 morphological units, and
  • 25,092 syntactic units.

At the morphological level, lemmas encode information with regard to their relation with other lemmas, spelling variations, etc., as well as information concerning their grammatical category (Part of Speech), and their inflection (inflectional paradigm, stems).

At the next level, syntactic units are used to encode the syntactic behaviour of a lemma: i.e. the complements a lemma selects, as well as the features required for the characterisation and identification of these complements (e.g. whether it is a subject - noun in nominative case, etc.)

Lemma distribution per grammatical category at each level

Morphological level

Grammatical category Number of entries
Noun 12,402
Verb 3,014
Adjective 3,405
Adverb 1,396
Numeral 106
Pronoun 45
Article 2
Preposition 48
Conjunction 51
Interjection 21
Particle 19
TOTAL 20,149

 

Syntactic level

Grammatical category Number of entries
Noun 14,548
Verb 5,397
Adjective 3,558
Adverb 1,410
Preposition 73
Numeral 106
TOTAL 25,092

 

For more information and lexicon samples, please visit the PAROLE web site.

 
 

Projects