Publication - A comparative study on authorship attribution classification tasks using both neural network and statistical methods
PROFILE

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

Research Area:  
Other topics in Computer Science
    
Type:  
Journal article

 

Year: 2010
Authors: N. Tsimboukakis; George Tambouratzis
Journal: Neural Computing & Applications
Volume: 19
Number: 4
Pages: 573-582
DOI: 10.1007/s00521-009-0314-7
Abstract:
The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based on their authors’ style. This task is of particular importance for information retrieval applications involving very large document databases. The emphasis of this article is to determine the extent to which the MLP model can be fine-tuned to successfully analyse such data, uncovering the stylistic differences among authors. The MLP-based method is compared and contrasted to statistical techniques, such as discriminant analysis, that are widely used in stylistic studies. The comparison of the methods is based on their classification performance, to provide an objective evaluation of the advantages of each method. A second aim of the study presented here is to compare the effectiveness of distinct features in the task of uncovering the author identity for each method. To evaluate to a greater depth the effectiveness of the entire approach, the results of the proposed MLP-based method are compared to those of established approaches, such as the support vector machines (SVM), using both the original parameters employed by the MLP as well as term frequency–inverse document frequency (TF–IDF) parameters, and the cascade correlation approach. It is found that the proposed MLP-based approach possesses a number of advantages, such as high classification accuracy, broadly comparable to that of the SVM, coupled with the ability to algorithmically reduce the set of parameters used without adversely affecting the classification accuracy.
[Bibtex]