Heuristics. (c2006)

Kallab, Chadi

Heuristics. (c2006)

Kallab, Chadi

URI: http://hdl.handle.net/10725/925

DOI: https://doi.org/10.26756/th.2006.54

Date: 2011-10-27

Abstract:

This thesis focuses on the NP-Hard problem of finding an optimal tree topology where leaves represent biological sequences. The problem consists of minimizing the number of changes between given and/or derived sequences. As the number of sequences to be compared increases, the size of the search space grows exponentially, requesting the use of optimization methods, to come up with an acceptable optimal topology. Even though, research is relating a large number of species and gene families to each others, the computation intensive load of many popular methods for evaluating trees (ex: parsimony and maximum likelihood) establish the quasi-inexistence of an exact solution for more than about 20 sequences. The alignment of two sequences (pairwise alignment), or multiple sequences (Multiple Sequence Alignment - MSA), and the alignment of short or long sequences such as an entire genome may require different types of algorithms. The algorithms used in all of these four cases are dynamic programming, linear programming based or heuristic-based or a combination of them. For large numbers of sequences to be aligned, heuristic methods may give a result similar to the exact solution offered by the dynamic programming or linear programming based algorithms, but in much shorter time. Heuristics used in phylogenetic inference include greedy algorithms, hill climbing, simulated annealing, and genetic algorithms. Thus, this research aims to suggest a general way to encode the problem into instances of different heuristic algorithms. Another focus of the document would be to suggest that the heuristics used, be implemented in a most optimal way, trying to get a compromise between speedup, flexibility and detailed tracing/chronology of each run of the algorithms.

Show full item record