ΤΑΥΤΟΤΗΤΑ
Word map systems for content-based document classification
Ερευνητική περιοχή:  
Άλλα θέματα Πληροφορικής
Είδος:  
Άρθρο σε περιοδικό
Έτος: | 2011 | ||||
---|---|---|---|---|---|
Συγγραφείς: | N. Tsimboukakis; Γιώργος Ταμπουρατζής | ||||
Περιοδικό: | IEEE Transactions on Systems, Man & Cybernetics – Part C | ||||
Σελίδες: | in print | ||||
DOI: | 10.1109/TSMCC.2010.2096416 | ||||
Περίληψη: | The main purpose of this paper is the classification of documents in terms of their content. Two systems are presented here that share a two-level architecture that include 1) a word map created via unsupervised learning that functions as a document-representation module and 2) a supervised multilayer-perceptron-based classifier. Two approaches to create word maps are presented and compared; these are based on hidden Markov models (HMMs) and the self-organizing map. A series of experiments is performed on several datasets of text-only documents, which are written in either Greek or in English. A comparison with established methods, such as the support-vector machine (SVM), illustrates the effectiveness of the proposed systems. |
||||
[Bibtex] |