Abstract:
The Internet of Things (IoT) is ushering-in the era of connected environments, i.e., networks of physical objects that are embedded with sensors and softwar, connecting and exchanging data with other devices and systems. The huge amount of data produced by such systems calls for solutions to reduce the amount of data being handled and transmitted over the network. In this study, we investigate data deduplication as a prominent pre-processing method that can address such a challenge. Data deduplication techniques have been traditionally developed for data storage and data warehousing applications, and aim at identifying and eliminating redundant data items. Few recent approaches have been designed for sensor networks and connected environments, yet existing solutions mostly rely on crisp thresholds and provide minimum-to-no expert control over the deduplication process, disregarding the domain expert’s needs in defining redundancy. In this study, we propose a new approach for Fuzzy Redundancy Elimination for Data Deduplication in a connected environment. We use simple natural language rules to represent domain knowledge and expert preferences regarding data duplication boundaries. We then apply pattern codes and fuzzy reasoning to detect duplicate data items at the outer-most edge (sensor node) level of the network. This reduces the time required to hard-code the deduplication process, while adapting to the domain expert’s needs for different data sources and applications. Experiments on a real-world dataset highlight our solutions’ potential and improvement compared with existing solutions.
Citation:
Yakhni, S., Tekli, J., Mansour, E., & Chbeir, R. (2023, August). Fuzzy Data Deduplication at Edge Nodes in Connected Environments. In International Conference on Mobile Web and Intelligent Information Systems (pp. 112-128). Cham: Springer Nature Switzerland.