Abstract:
In recent years, there has been a worldwide initiative to gather as much information as
possible about the human genome, resulting in the Human Genome Project (HGP).
The HGP project was founded on the basis of gathering genetic information to be
used in various bioinformatics areas. The Human Genome Project’s main purpose
is to find the common ancestry among various peoples around the globe in order
to identify origins and gene-related diseases. Haplotype Inference(HI) is one of the
problems tackled in the HGP, whereby from a given population of genotypes the goal
is to find the minimum number of haplotypes from which the genotypes could have
derived. Clark’s Algorithm is the first known algorithm to deal with this problem
from a Computer Scientist’s perspective. It has been the basis for many other algorithms
afterwards. One such algorithm is the Delayed Haplotype Selection(DS).
Our work is an improvement of the DS. We call the resulting algorithm Revamped
Haplotype Selection(RDS) algorithm. We test our algorithm on real and simulated
data, and compare it to the DS algorithm and a Branch-and-Bound approach (known
as HAPAR). Results prove that our algorithm significantly outperforms both in the quality of the solution as well as in running time.