An XML Document Comparison Framework

LAUR Repository

Show simple item record

dc.contributor.author Tekli, Joe
dc.contributor.author Chbeir, Richard
dc.contributor.author Yetongnon, Kokou
dc.date.accessioned 2017-02-01T09:56:13Z
dc.date.available 2017-02-01T09:56:13Z
dc.date.copyright 2001 en_US
dc.date.issued 2017-02-01
dc.identifier.issn 0306-4379 en_US
dc.identifier.uri http://hdl.handle.net/10725/5141
dc.description.abstract As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as Ordered Labeled Trees. Nevertheless, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison method to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and allow the end-user to tune the comparison process according to her requirements. Our approach consists of four main modules for i) discovering the structural commonalities between sub-trees, ii) identifying sub-tree semantic resemblances, iii) computing tree-based edit operations costs, iv) and computing tree edit distance. A prototype has been developed to evaluate the optimality and performance of our method. Results demonstrate higher comparison accuracy with respect to alternative XML comparison methods, while timing experiments reflect the significant impact of semantic similarity assessment on overall system performance. en_US
dc.language.iso en en_US
dc.title An XML Document Comparison Framework en_US
dc.type Article en_US
dc.description.version Published en_US
dc.author.school SOE en_US
dc.author.idnumber 201306321 en_US
dc.author.department Electrical And Computer Engineering en_US
dc.description.embargo N/A en_US
dc.relation.journal Information Systems en_US
dc.article.pages 1-47 en_US
dc.keywords Semi-structured XML-based data en_US
dc.keywords Structural Similarity en_US
dc.keywords Tree Edit Distance en_US
dc.keywords Semantic similarity en_US
dc.keywords Information Retrieva en_US
dc.identifier.ctation Tekli, J., Chbeir, R., & Yetongnon, K. (2008). An XML Document Comparison Framework. Information Systems Journal. en_US
dc.author.email joe.tekli@lau.edu.lb en_US
dc.identifier.tou http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php en_US
dc.identifier.url https://www.researchgate.net/profile/Richard_Chbeir/publication/228963576_An_XML_Document_Comparison_Framework/links/0912f50e338c10edfd000000.pdf en_US
dc.author.affiliation Lebanese American University en_US

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search LAUR

Advanced Search


My Account