A Greek Corpus of Aphasic Discourse: Collection, Transcription, and Annotation Specifications
|Authors:||S. Varlokosta; Spyridoula Stamouli; A. Karasimos; G. Markopoulos; M. Kakavoulia; M. Nerantzini; A. Pantoula; V. Fyndanis; A. Economou; A. Protopapas|
|Book title:||Proceedings of the 10th Language Resources and Evaluation Conference (LREC-2016), Workshop “Resources and Processing of linguistic and extra-linguistic data from people with various forms of cognitive/psychiatric impairments (RaPID-2016)|
|Date:||May 23, 2016|
In this paper, the process of designing an annotated Greek Corpus of Aphasic Discourse (GREECAD) is presented. Given that resources of this kind are quite limited, a major aim of the GREECAD was to provide a set of specifications which could serve as a methodological basis for the development of other relevant corpora, and, therefore, to contribute to the future research in this area. The GREECAD was developed with the following requirements: a) to include a rather homogeneous sample of Greek as spoken by individuals with aphasia; b) to document speech samples with rich metadata, which include demographic information, as well as detailed information on the patients’ medical record and neuropsychological evaluation; c) to provide annotated speech samples, which encode information at the micro-linguistic (words, POS, grammatical errors, clause types, etc.) and discourse level (narrative structure elements, main events, evaluation devices, etc.). In terms of the design of the GREECAD, the basic requirements regarding data collection, metadata, transcription, and annotation procedures were set. The discourse samples were transcribed and annotated with the ELAN tool. To ensure accurate and consistent annotation, a Transcription and Annotation Guide was compiled, which includes detailed guidelines regarding all aspects of the transcription and annotation procedure.