A readability statistical model for pedagogically relevant text retrieval
|Editor:||Παπαδοπούλου Δέσποινα, Ρεβυθιάδου Ανθή|
|Book title:||Proceedings of the 32nd Annual Meeting Department of Linguistics, AUTH|
|Series:||Μελέτες για την Ελληνική γλώσσα|
The current paper approaches the issue of Greek readability in the framework of first language education. It investigates a large number of linguistic features and employs discriminant analysis as a text classification method to classify texts in two levels of readability, suitable and not-suitable for junior high school students. The model takes into account grammar and vocabulary features (word length, lexical density, Guiraud’s R, sentence length, number of conjunctions, number of quantifiers and nouns, average width of syntactic trees, and number of passive verbs). The model was validated in two different test sets, each estimated for readability level by educators and by 913 junior high school students, and it was found to be reliably predictive of both the educators’ and students’ estimation of readability.