A hybrid approach for XML similarity

Tekli, Joe; Chbeir, Richard; Yetongnon, Kokou

A hybrid approach for XML similarity

Tekli, Joe; Chbeir, Richard; Yetongnon, Kokou

URI: http://hdl.handle.net/10725/7058

URL: https://link.springer.com/chapter/10.1007/978-3-540-69507-3_68

DOI: https://doi.org/10.1007/978-3-540-69507-3_68

Date: 2007

Terms of Use: This item is made available under the terms and conditions applicable to " Conference Paper / Proceeding ", as set forth at: http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php

Abstract:

In the past few years, XML has been established as an effective means for information management, and has been widely exploited for complex data representation. Owing to an unparalleled increasing use of the XML standard, developing efficient techniques for comparing XML-based documents becomes essential in information retrieval (IR) research. Various algorithms for comparing hierarchically structured data, e.g. XML documents, have been proposed in the literature. However, to our knowledge, most of them focus exclusively on comparing documents based on structural features, overlooking the semantics involved. In this paper, we integrate IR semantic similarity assessment in an edit distance algorithm, seeking to amend similarity judgments when comparing XML-based documents. Our approach comprises of an original edit distance operation cost model, introducing semantic relatedness of XML element/attribute labels, in traditional edit distance computations. A prototype has been developed to evaluate our model’s performance. Experiments yielded notable results.

Citation:

Tekli, J., Chbeir, R., & Yetongnon, K. (2007, January). A hybrid approach for xml similarity. In International Conference on Current Trends in Theory and Practice of Computer Science (pp. 783-795). Berlin, Heidelberg: Springer Berlin Heidelberg.