Comparative analysis of text categorization algorithms

dc.contributor.authorAdewole, A.P.
dc.contributor.authorOmitiran, D.M.
dc.date.accessioned2019-09-10T12:07:54Z
dc.date.available2019-09-10T12:07:54Z
dc.date.issued2017
dc.descriptionStaff publicationsen_US
dc.description.abstractText categorization (also known as text classification) is the task of automatically assigning documents to a category (or categories) from a pre-specified set. This task has several applications, including spam filtering, identification of document genre, automated indexing of scientific articles according to a predefined thesauri of technical terms, and even the automated extraction of metadata. The importance of text categorization cannot be overemphasized due to the fact that unstructured texts are the largest readily available source of data and manual organization of this data is infeasible due to the large number of documents involved as well as time constraints. The accuracy of modern text categorization machines rivals that of trained human professionals. This study experimentally compared four machine learning classifiers used in text categorization. These algorithms are; Naïve Bayes, Decision trees, k-Nearest Neighbour (kNN) and Support Vector Machines (SVM). These classifiers were developed using Python programming language. When run on the Reuters dataset, SVM significantly outperforms Naïve Bayes, kNN and Decision Trees. Decision trees performed worst of the four algorithms considered in this study. From observations made during the course of running these experiments, there seems to be a trade- off between simplicity and effectiveness. In conclusion, the results of this comparative analysis prove that SVM is the most effective of the classifiers considered in this study.en_US
dc.identifier.citationAdewole, A.P and Omitiran, D.M. (2017). Comparative analysis of text categorization algorithms. The journal of computer science and its applications, Nigeria computer society, Vol.24(2): 120-133en_US
dc.identifier.urihttps://ir.unilag.edu.ng/handle/123456789/5463
dc.language.isoenen_US
dc.publisherThe journal of computer science and its applications, Nigeria computer societyen_US
dc.relation.ispartofseriesThe journal of computer science and its applications, Nigeria computer society;Vol.24(2)
dc.subjectClassifieren_US
dc.subjectDecision treesen_US
dc.subjectk-Nearest Neighbour (kNN)en_US
dc.subjectMachine learningen_US
dc.subjectNaïve Bayesen_US
dc.subjectSupport Vector Machines (SVM)en_US
dc.subjectText categorizationen_US
dc.subjectText classificationen_US
dc.subjectResearch Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer scienceen_US
dc.titleComparative analysis of text categorization algorithmsen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ADEWOLE17_TEXT_CATEGORIZATION.pdf
Size:
293.56 KB
Format:
Adobe Portable Document Format
Description:
Main article
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: