Δημοσίευση - Discriminating the Registers and Styles in the Modern Greek Language-Part 2: Extending the Feature Vector to Optimize Author Discrimination
ΕΡΓΑ

Discriminating the Registers and Styles in the Modern Greek Language-Part 2: Extending the Feature Vector to Optimize Author Discrimination

Ερευνητική περιοχή:  
Άλλα θέματα Πληροφορικής
    
Είδος:  
Άρθρο σε περιοδικό

 

Έτος: 2004
Συγγραφείς: Γιώργος Ταμπουρατζής; Στέλλα Μαρκαντωνάτου; N. Hairetakis; Μαρίνα Βασιλείου; Γεώργιος Καραγιάννης; D. Tambouratzis
Περιοδικό: Literary and Linguistic Computing
Τόμος: 19
Αριθμός: 2
Σελίδες: 221-242
DOI: 10.1093/llc/19.2.221
Περίληψη:
This article describes a method for discriminating among authors within a given register of Modern Greek. The focus here is to determine to what extent the stylistic differences among authors can be detected with a high degree of accuracy for a set of texts belonging to a well?defined register. To that end, the chosen register is characterized by a well?defined sub?language, from which a corpus of more than 1,000 documents has been created. To discriminate the texts according to author style, a series of experiments have been performed using statistical techniques. Each text has been represented by a vector covering several linguistic aspects, in an effort to determine the most effective style markers. The experimental results indicate that the proposed approach can successfully separate the author styles for a given register. An extensive study of the effectiveness of the different variable categories has been performed. For instance, diglossia information on its own is not sufficient for author discrimination. Instead, a systematic evaluation process indicates that part?of?speech, structural and algorithmically derived lemma?frequency variables are the most important style markers, their use leading to an author discrimination accuracy exceeding 90%.
[Bibtex]