Abstract:
XML grammar matching has found considerable interest recently,
due to the growing number of heterogeneous XML documents on the web, and
the increasing need to integrate, and consequently search and retrieve XML
documents originated from different data sources. In this study, we provide an
approach for automatic XML matching and comparison aiming to minimize
the amount of user effort required to perform the match task. We propose an
extensible framework based on the concept of tree edit distance, integrating
different matching criterions so as to capture
XML grammar element semantic and syntactic similarities, cardinality and
alternativeness constraints, as well as data-type correspondences and relative
ordering. Our method is not bound to any specific XML grammar language
(e.g., DTD or XSD), and covers all basic operators and constraints. In addition,
our framework is flexible, enabling the user to choose mapping cardinality
(i.e., 1:1, 1:n, n:1, n:n), in comparison with exiting static methods (usually
constrained to 1:1). User constraints and feedback are equally considered in
order to adjust matching results to the user’s perception of correct matches. A
prototype has been developed to evaluate and test our approach. Experiments
on real and synthetic XML grammars demonstrate the efficiency of our
matching strategy in identifying mappings, in comparison with alternative
methods, while timing results underline the impact of semantic similarity
evaluation on overall system performance.
Citation:
Tekli, J., Chbeir, R., & Yetongnon, K. XML Grammar Matching and Comparison: Technical Report.