Abstract:
Automatic indexing and texts retrieval methods for languages have been studied for a long time. Compared to other languages, there is still limited research which has been conducted for the automated Arabic Text Categorization. In this work, we present an innovative method to reinforce the accuracy of automatic indexing of Arabic texts by introducing a Thesaurus. Our model extracts new relevant words by referring to the introduced thesaurus which identi es words correlation. The Thesaurus is built through an NLTK toolkit which contains a library that lists the synonyms of a certain word available in WordNet library. The words
having the same meaning and that frequently appear together were grouped under one umbrella using a JSON dictionary making it easier to identify the texts topic. Our results exhibit notable improvement in accuracy and efficiency compared to previous works.