An XML Document Comparison Framework

Tekli, Joe; Chbeir, Richard; Yetongnon, Kokou

dc.contributor.author	Tekli, Joe
dc.contributor.author	Chbeir, Richard
dc.contributor.author	Yetongnon, Kokou
dc.date.accessioned	2017-02-01T09:56:13Z
dc.date.available	2017-02-01T09:56:13Z
dc.date.copyright	2001	en_US
dc.date.issued	2001
dc.identifier.issn	0306-4379	en_US
dc.identifier.uri	http://hdl.handle.net/10725/5141
dc.description.abstract	As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as Ordered Labeled Trees. Nevertheless, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison method to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and allow the end-user to tune the comparison process according to her requirements. Our approach consists of four main modules for i) discovering the structural commonalities between sub-trees, ii) identifying sub-tree semantic resemblances, iii) computing tree-based edit operations costs, iv) and computing tree edit distance. A prototype has been developed to evaluate the optimality and performance of our method. Results demonstrate higher comparison accuracy with respect to alternative XML comparison methods, while timing experiments reflect the significant impact of semantic similarity assessment on overall system performance.	en_US
dc.language.iso	en	en_US
dc.title	An XML Document Comparison Framework	en_US
dc.type	Article	en_US
dc.description.version	Published	en_US
dc.author.school	SOE	en_US
dc.author.idnumber	201306321	en_US
dc.author.department	Electrical And Computer Engineering	en_US
dc.description.embargo	N/A	en_US
dc.article.pages	1-47	en_US
dc.keywords	Semi-structured XML-based data	en_US
dc.keywords	Structural Similarity	en_US
dc.keywords	Tree Edit Distance	en_US
dc.keywords	Semantic similarity	en_US
dc.keywords	Information Retrieva	en_US
dc.identifier.ctation	Tekli, J., Chbeir, R., & Yetongnon, K. (2001). An XML Document Comparison Framework.	en_US
dc.author.email	joe.tekli@lau.edu.lb	en_US
dc.identifier.tou	http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php	en_US
dc.identifier.url	https://www.researchgate.net/publication/228963576_An_XML_Document_Comparison_Framework	en_US
dc.orcid.id	https://orcid.org/0000-0003-3441-7974	en_US
dc.author.affiliation	Lebanese American University	en_US