Abstract:
Large texts are not always entirely meaningful: they might include repetitions and useless details, and might not be easy to interpret by humans. Automatic text summarization aims to simplify text by making it shorter and (possibly) more informative. This paper describes a new solution for extractive text summarization, designed to efficiently process flat (unstructured) text. It performs unsupervised frequency-based document processing to identify the candidate sentences having the highest potential to represent informative content in the document. It introduces a dedicated feature vector representation for sentences to evaluate the relative impact of different sentence terms. The sentence feature vectors are run through a partitional k-means clustering process, to build the extractive summary based on the cluster representatives. Experimental results highlight the quality and efficiency of our approach.
Citation:
Hajjar, A., & Tekli, J. (2022, August). Unsupervised extractive text summarization using frequency-based sentence clustering. In European conference on advances in databases and information systems (pp. 245-255). Cham: Springer International Publishing.