.

Supervised term-category feature weighting for improved text classification

LAUR Repository

Show simple item record

dc.contributor.author Attieh, Joseph
dc.contributor.author Tekli, Joe
dc.date.accessioned 2024-08-20T10:08:25Z
dc.date.available 2024-08-20T10:08:25Z
dc.date.copyright 2023 en_US
dc.date.issued 2022-12-28
dc.identifier.issn 0950-7051 en_US
dc.identifier.uri http://hdl.handle.net/10725/15996
dc.description.abstract Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category-based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives. en_US
dc.language.iso en en_US
dc.title Supervised term-category feature weighting for improved text classification en_US
dc.type Article en_US
dc.description.version Published en_US
dc.author.school SOE en_US
dc.author.idnumber 201306321 en_US
dc.author.department Electrical And Computer Engineering en_US
dc.relation.journal Knowledge-Based Systems en_US
dc.journal.volume 261 en_US
dc.keywords Text classification en_US
dc.keywords Document and text processing en_US
dc.keywords Feature Engineering en_US
dc.keywords Supervised term weighting en_US
dc.keywords Inverse Category Frequency en_US
dc.keywords TF-IDF en_US
dc.keywords Text representation en_US
dc.identifier.doi https://doi.org/10.1016/j.knosys.2022.110215 en_US
dc.identifier.ctation Attieh, J., & Tekli, J. (2023). Supervised term-category feature weighting for improved text classification. Knowledge-Based Systems, 261, 110215. en_US
dc.author.email joe.tekli@lau.edu.lb en_US
dc.identifier.tou http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php en_US
dc.identifier.url https://www.sciencedirect.com/science/article/pii/S0950705122013119 en_US
dc.orcid.id https://orcid.org/0000-0003-3441-7974 en_US
dc.author.affiliation Lebanese American University en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search LAUR


Advanced Search

Browse

My Account