Approximate XML structure validation based on document–grammar tree similarity

LAUR Repository

Show simple item record

dc.contributor.author Tekli, Joe
dc.contributor.author Chbeir, Richard
dc.contributor.author Traina, Agma J.M.
dc.contributor.author Traina, Caetano
dc.contributor.author Fileto, Renato
dc.date.accessioned 2017-01-27T08:21:51Z
dc.date.available 2017-01-27T08:21:51Z
dc.date.copyright 2015 en_US
dc.date.issued 2017-01-27
dc.identifier.issn 0020-0255 en_US
dc.identifier.uri http://hdl.handle.net/10725/5082
dc.description.abstract Comparing XML documents with XML grammars, also known as XML document and grammar validation, is useful in various applications such as: XML document classification, document transformation, grammar evolution, XML retrieval, and the selective dissemination of information. While exact (Boolean) XML validation has been extensively investigated in the literature, the more general problem of approximate (similarity-based) XML validation, i.e., document–grammar similarity evaluation, has not yet received strong attention. In this paper, we propose an original method for measuring the structural similarity between an XML document and an XML grammar (DTD or XSD), considering their most common operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes (e.g., ?, ∗, MinOccurs, MaxOccurs, etc.). Our approach exploits the concept of tree edit distance, introducing a novel edit distance recurrence and dedicated algorithms to effectively compare XML documents and grammar structures, modeled as ordered labeled trees. Our method also inherently performs exact validation by imposing a maximum similarity threshold (minimum edit distance) on the returned results. We implemented a prototype and conducted several experiments on large sets of real and synthetic XML documents and grammars. Results underline our approach’s effectiveness in classifying similar documents with respect to predefined grammars, accurately detecting document and/or grammar modifications, and performing document and grammar relevance ranking. Time and space analysis were also conducted. en_US
dc.language.iso en en_US
dc.title Approximate XML structure validation based on document–grammar tree similarity en_US
dc.type Article en_US
dc.description.version Published en_US
dc.author.school SOE en_US
dc.author.idnumber 201306321 en_US
dc.author.department Electrical And Computer Engineering en_US
dc.description.embargo N/A en_US
dc.relation.journal Information Sciences en_US
dc.journal.volume 295 en_US
dc.journal.issue 20 en_US
dc.article.pages 258-302 en_US
dc.keywords XML en_US
dc.keywords Semi-structured data en_US
dc.keywords XML grammar en_US
dc.keywords Structural similarity en_US
dc.keywords Tree edit distance en_US
dc.keywords Document classification en_US
dc.identifier.doi http://dx.doi.org/10.1016/j.ins.2014.09.044 en_US
dc.identifier.ctation Tekli, J., Chbeir, R., Traina, A. J., Traina, C., & Fileto, R. (2015). Approximate XML structure validation based on document–grammar tree similarity. Information Sciences, 295, 258-302. en_US
dc.author.email joe.tekli@lau.edu.lb en_US
dc.identifier.tou http://libraries.lau.edu.lb/research/laur/terms-of-use/articles.php en_US
dc.identifier.url http://www.sciencedirect.com/science/article/pii/S0020025514009566 en_US
dc.author.affiliation Lebanese American University en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search LAUR

Advanced Search


My Account