Δημοσίευση - A comparative study on authorship attribution classification tasks using both neural network and statistical methods

ΑΝΑΖΗΤΗΣΗ

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

Ερευνητική περιοχή:  
Άλλα θέματα Πληροφορικής
    
Είδος:  
Άρθρο σε περιοδικό

 

Έτος: 2010
Συγγραφείς: N. Tsimboukakis; Γιώργος Ταμπουρατζής
Περιοδικό: Neural Computing & Applications
Τόμος: 19
Αριθμός: 4
Σελίδες: 573-582
DOI: 10.1007/s00521-009-0314-7
Περίληψη:
The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based on their authors’ style. This task is of particular importance for information retrieval applications involving very large document databases. The emphasis of this article is to determine the extent to which the MLP model can be fine-tuned to successfully analyse such data, uncovering the stylistic differences among authors. The MLP-based method is compared and contrasted to statistical techniques, such as discriminant analysis, that are widely used in stylistic studies. The comparison of the methods is based on their classification performance, to provide an objective evaluation of the advantages of each method. A second aim of the study presented here is to compare the effectiveness of distinct features in the task of uncovering the author identity for each method. To evaluate to a greater depth the effectiveness of the entire approach, the results of the proposed MLP-based method are compared to those of established approaches, such as the support vector machines (SVM), using both the original parameters employed by the MLP as well as term frequency–inverse document frequency (TF–IDF) parameters, and the cascade correlation approach. It is found that the proposed MLP-based approach possesses a number of advantages, such as high classification accuracy, broadly comparable to that of the SVM, coupled with the ability to algorithmically reduce the set of parameters used without adversely affecting the classification accuracy.
[Bibtex]