Abstract:
In the realm of graph theory, correlation cluster is studied as a graph modification problem
known under "Cluster Editing." In this problem, we apply a sequence of k edge (or vertex)
additions and/or deletions so the graph becomes a topological sum of clusters. At first, a cluster
was considered a clique but relaxation models have later emerged as cliques were deemed
too strict of a requirement in application domains. Another limitation of cluster editing is that
it forces each vertex to be in exactly one cluster, i.e. it does not allow overlapping communities.
In this thesis, we study two novel problems: 2-CLUB CLUSTER VERTEX SPLITTING
(2CCVS) and 2-CLUB CLUSTER EDITING WITH VERTEX SPLITTING (2CCEDVS). 2-Clubs
alleviate the strictness of cliques and vertex splitting allows for a vertex to be in multiple clusters,
overcoming two main limitations of the currently studied models. Both of these problems
are proved to be NP-Hard and heuristics are presented for 2CCED and 2CCEDVS.We compare
the heuristic’s performance against well-known clustering algorithms on Biological and Social
Networks. The presented 2CCEDVS heuristic achieves the best cluster separation, i.e. highest
inter-cluster distance.