DG-Means – A Superior Greedy Algorithm for Clustering Distributed Data

Assaf, Ali

DG-Means – A Superior Greedy Algorithm for Clustering Distributed Data

Assaf, Ali

DOI: https://doi.org/10.26756/th.2022.492

Date: 2022-07-26

Terms of Use: This item is made available under the terms and conditions applicable to " Thesis ", as set forth at: http://libraries.lau.edu.lb/research/laur/terms-of-use/thesis.php

Abstract:

Clustering is the process of dividing a set of objects into several classes in which each class is composed of similar objects. Traditional centralized clustering algorithms target those objects that are located in the same site, whereas it cannot perform on distributed objects. Distributed clustering algorithms, however, can fulfil this gap. They extract a classification model from the distributed objects even when they are in different sites and locations. In today’s life, and due to the trend of storing data on different locations and sites, the popularity of distributed data is getting tremendously booming. It seems to be one of the most prevailing fields in the coming decades, especially with the huge amount of data propagating throughout the web. Even though a lot of research and work was done on this topic, it is still considered in its infantry because of the challenges that is still popping up such as bandwidth limitation, transferring data to single site and many others. In this work, we present DG-means, which is a greedy algorithm that performs on distributed sets of data. Three datasets - Wholesale dataset, Banknotes dataset, and Iris dataset are used to compare multiple distributed clustering algorithms on different matrices: runtime execution, stability, and accuracy. DG-means exhibited superior performance when compared to the other algorithms.

Show full item record