Abstract:
XML grammar matching has found considerable interest recently, due to the growing number of heterogeneous XML documents on the Web, and the need to integrate, search and retrieve XML documents originated from different data sources. In this study, we provide an approach for automatic XML grammar matching and comparison aiming to minimize the amount of user effort required to perform the match task. This requires (i) considering the various characteristics and constraints of XML grammars (in comparison with ‘grammar simplifying’ approaches), (ii) allowing a flexible combination of different matching criteria (in comparison with static approaches), and (iii) effectively considering the semi-structured nature of XML (in contrast with heuristic methods). To achieve this, we propose an extensible framework based on the concept of tree edit distance as an optimal technique to consider XML structure, integrating different matching criteria to capture all basic XML grammar characteristics, ranging over element semantic and syntactic similarities, cardinality and alternativeness constraints, as well as data-type correspondences and relative ordering. In addition, our framework is flexible, enabling the user to choose mapping cardinality (i.e., 1:1, 1:n, n:1, n:n), in comparison with exiting static methods (usually constrained to 1:1). User constraints and feedback are equally considered in order to adjust matching results to the user’s perception of correct matches. Experiments on real and synthetic XML grammars demonstrate the effectiveness and efficiency of our matching strategy in identifying mappings, in comparison with alternative methods.
Citation:
Tekli, J., & Chbeir, R. (2012). Minimizing user effort in XML grammar matching. Information Sciences, 210, 1-40.