Abstract:
Breast cancer is an ominous disease that affects many women; it is ranked as the fifth
cause of death and the second common cancer worldwide. Analyzing breast cancer
gene expression profiles for understanding genetic similarities is a very challenging
problem, since a lot about the functions of many genes is still to be revealed.
Computational techniques have proved reliable to support the clinics in diagnosis and
therapy. In this thesis, we use a data mining method to find a logical correlation
behind the clustering pattern of the genes involved in breast cancer. We design a
growing hierarchical self-organizing map (GHSOM) to mine gene microarray data.
GHSOM configures its topology during unsupervised learning process according to
the features of the input genes microarray data, without other prior knowledge.
GHSOM clusters genes that are related to each other by utilizing their microarray
expression levels. We have applied GHSOM to 24,481 genes of DNA microarray of
breast tumor samples from 117 patients. Our results have revealed 17 genes that are
likely to be correlated, in small subsets, with four breast cancer marker genes. This
result is promising for diagnosis and for better understanding of breast cancer.