.

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

LAUR Repository

Show simple item record

dc.contributor.author Tekli, Joe
dc.contributor.author Chbeir, Richard
dc.date.accessioned 2017-01-27T09:24:13Z
dc.date.available 2017-01-27T09:24:13Z
dc.date.copyright 2012 en_US
dc.date.issued 2017-01-27
dc.identifier.issn 1570-8268 en_US
dc.identifier.uri http://hdl.handle.net/10725/5084
dc.description.abstract XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance. en_US
dc.language.iso en en_US
dc.title A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics en_US
dc.type Article en_US
dc.description.version Published en_US
dc.author.school SOE en_US
dc.author.idnumber 201306321 en_US
dc.author.department Electrical And Computer Engineering en_US
dc.description.embargo N/A en_US
dc.relation.journal Web Semantics: Science, Services and Agents on the World Wide Web en_US
dc.journal.volume 11 en_US
dc.article.pages 14-40 en_US
dc.keywords XML (semi-structured) data en_US
dc.keywords Structural similarity en_US
dc.keywords Tree edit distance en_US
dc.keywords Semantic similarity en_US
dc.keywords Information retrieval en_US
dc.keywords Vector space model en_US
dc.identifier.doi http://dx.doi.org/10.1016/j.websem.2011.10.002 en_US
dc.identifier.ctation Tekli, J., & Chbeir, R. (2012). A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 11, 14-40. en_US
dc.author.email joe.tekli@lau.edu.lb en_US
dc.identifier.tou http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php en_US
dc.identifier.url http://www.sciencedirect.com/science/article/pii/S1570826811000825 en_US
dc.author.affiliation Lebanese American University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search LAUR


Advanced Search

Browse

My Account