Project - Parole

PROJECTS

Parole

Start date:

01-04-1996

Funded by:

LE II (LE2 4017 - 10379)

Project leader:

Maria Gavrilidou

The aim of the PAROLE project was the compilation of large, generic and re-usable Written Language Resources for all EU Languages, comprising more specifically:

General language text corpora of the size of 20,000,000 words in 14 languages (Belgian French, Catalan, Danish, Dutch, English, French, Finnish, German, Greek, Irish, Italian, Norwegian, Portuguese and Swedish) and
computational lexicons with 20,000 lemmas in 12 languages (Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, Swedish).

The value of these resources lies not only in the size and number of languages covered by the project, but also in the fact that they are built according to common standards and specifications:

as regards text corpora, they have been compiled and annotated following the same guidelines:
- texts have been selected on the basis of specified common parameters for time of production (after 1970) and proportionate representation of the textual material according to publication medium (Book, Newspaper, Periodical and Miscellaneous)
- all texts have been annotated using the same mark-up format (PAROLE DTD) as regards bibliographical information and text structure (annotation at the level of paragraph)
- a subset of the corpus (250,000 words) has been morphosyntactically annotated according to a common core PAROLE tagset, extended with a set of language specific features
as regards the lexica, harmonisation was achieved by developing a common model (the PAROLE model) which caters for the encoding of morphological and syntactic information in all languages; thus, all the lexicons have been built according to the same design principles and linguistic specifications and are encoded in the same representation format.

Following the completion of the project, the following subset of the resources for each language is available to the research community, either through the European Language Resources Association (ELRA) or directly through the project participants:

a subset of the text corpus (3,000,000 words), including the morphosyntactically annotated subcorpus, and
the computational lexicon.

For more information on the project, please visit the PAROLE/SIMPLE web site: http://www.ub.es/gilcub/SIMPLE/simple.html.

Research areas

Language Resources Infrastructure

Products

Computational morphological and syntactic lexicon of Modern Greek

Project Partners:

Center for Language Technology (Denmark)
Centro de Linguistica da Universidade de Lisboa (Portugal)
Department of General Linguistics, University of Helsinki (Finland)
Department of Swedish, Sprakdata, University of Gothenburg (Sweden)
Det Danske Sprog-og Litteraturselskab (Denmark)
Erli (France)
Fundacion Bosch Gimpera, Universitat de Barcelona (Spain)
Institut d'Estudis Catalans (Spain)
Institut fur Deutsche Sprache (Germany)
Institut National de la Langue Francaise, CNRS (France)
Institute for Dutch Lexicology (The Netherlands)
Institute for Language, Speech and Hearing, University of Sheffield (U.K.)
Institute Teangelaiochta Eireann (Ireland)
Instituto de Engenharia de Sistemas e Computadores (Portugal)
Universite de Liege BELTEXT (Belgium)
University of Birmingham (U.K.)
University of Pisa (Italy)
University of Sheffield (U.K.)
Institute for Language and Speech Processing, R.C. "Athena" (Greece)

Parole

Quick links