Abstract:
In a time when data is experiencing a remarkable growth in different fields, extracting and understanding the correlations between different data features is becoming increasingly important in many application areas, ranging over business, demographics, politics, and medicine, among others. The proper exploitation of such data offers great challenges in terms of data analysis and visualization, in order to allow effective and efficient decision-making. The problem is further aggravated on the Web where the data is often loosely structured and multi-featured. In this context, interactive data visualization has been recently suggested as a promising solution to facilitate data analysis and help in unveiling patterns, trends, and anomalies in the data. In this research, we present a new unsupervised feature-based tool for data visualization titled “mirrored dendrograms”. Our tool accepts as input semi-structured and multi-featured data, and allows the user to select the target features to be visualized and mapped against each other, as well as their relative impacts (weights) on the visualization process. It then invokes a hierarchical clustering process to cluster the data following the user-chosen features, and produces a dendrogram structure for each combination of target features. The dendrograms are then mirrored against each other by mapping the internal nodes of one dendrogram against the other dendrogram’s nodes, in order to highlight their structure correlation. Different from existing solutions like parallel coordinates and heatmap dendrograms, our work offers three main contributions: (i) connecting the dendrograms through their internal nodes (instead of connecting their leaf nodes like with heatmap dendrograms and tangelgrams) providing a visual representation of the relationships between the data structures, (ii) allowing to zoom-in and out the data to show their relationships at different granularity levels (compared with existing static solutions which do not allow varying granularities), and (iii) identifying the best zooming level between two dendrograms highlighting the maximum correlation (similarity) with the minimal amount of details (granularity) presented to the user (based on the intuition that users wish to acquire the most value out of the data, while viewing the least amount of data, i.e., with the least
amount of effort). We have conducted a preliminary evaluation of our solution, using a sample dataset of 197 Electronic Health Records (EHRs) obtained from a private medical clinic, where all EHRs were vetted by a medical doctor from LAU Rizk hospital. The study was focused on the migraine headache disorder, where multiple patient data samples were mapped and visualized against each other. A number of 19 testers volunteered to participate in an online survey to help review and evaluate the data visualizations produced by our new tool, compared with existing solutions. Initial results are promising and highlight the potential our tool.