Abstract:
Text classification is a key task of the Natural Language Processing (NLP) field that aims at assigning predefined categories to textual documents. Performing text classification requires features that effectively represent the content and the meaning of textual documents. Selecting a suitable method for term weighting is of central importance and can improve the quality of the classification method. In this paper, we propose to a new text classification solution to perform Category-based Feature Augmentation (CFA) on the document representation. First, a term-category feature matrix is derived from a modified version of the supervised Term-Frequency Inverse-Category-Frequency (TF-ICF) weighting model. This is done by embedding the TF-ICF matrix in a one-layer feed-forward neural network. The latter is trained using the gradient descent algorithm allowing to iteratively update the term-category matrix until reaching convergence. The model produces category-based feature vector representations that are used to augment the document representations and perform the classification task. Experimental results on four benchmark datasets show that our lean model approach improves text classification accuracy and is significantly more efficient compared with its deep model alternatives.
Citation:
Attieh, J., & Tekli, J. (2023, November). Fast Text Classification using Lean Gradient Descent Feed Forward Neural Network for Category Feature Augmentation. In 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (pp. 2341-2348). IEEE.