Abstract:
While Information Retrieval (IR) systems have gained success in Web-style search engines in the past two decades, nonetheless, the DataBase (DB) paradigm remains prevalent in handling data in enterprise environments and digital libraries, and is gaining even more importance in the Semantic Web with the increasing need to handle partly structured (NoSQL) data. This paper describes SemIndex+, a semantic-aware indexing and querying framework that allows semantic search, result selection, and result ranking of structured (relational DB-style), unstructured (IR-style), and partly structured (NoSQL) data. Various weighting functions and a parallelized search algorithm have been developed for that purpose and are presented here. We provide a general keyword query model allowing the user to choose the results’ semantic coverage and expressiveness based on her needs. Different from alternative solutions involving query relaxation, query refinement, or query disambiguation, our approach incorporates semantics at the most basic data indexing level: providing more opportunities toward speedups and semantic coverage. An extensive experimental evaluation, comparing SemIndex+ with alternative methods, highlights our approach’s flexibility and effectiveness, which in turn impact efficiency (requiring less or more time following the user specified index and query semantic coverages).
Citation:
Tekli, J., Chbeir, R., Traina, A. J., & Traina Jr, C. (2019). SemIndex+: A semantic indexing scheme for structured, unstructured, and partly structured data. Knowledge-Based Systems, 164, 378-403.