Analysis And Predictions Of DNA Sequence Transformations On Grids
Abstract
Phylogenetics is the study of evolution of organisms. Evolution occurs due to mutations of DNA sequences. The reasons behind these seemingly random mutations are largely unknown. There are many algorithms that build phylogenetic trees from DNA sequences. However, there are certain uncertainties associated with these phylogenetic trees. Fine level analysis of these phylogenetic trees is both important and interesting for evolutionary biologists. In this thesis, we try to model evolutions of DNA sequences using Cellular Automata and resolve the uncertainties associated with the phylogenetic trees. In particular, we determine the effect of neighboring DNA base-pairs on the mutation of a base-pair. Cellular Automata can be viewed as an array of cells which modifies itself in discrete time-steps according to a governing rule. The state of the cell at the next time-step depends on its current state and state of its neighbors. We have used cellular automata rules for analysis and predictions of DNA sequence transformations on Computational grids.
In the first part of the thesis, DNA sequence evolution is modeled as a cellular automaton with each cell having one of the four possible states, corresponding to four bases. Phylogenetic trees are explored in order to find out the cellular automata rules that may have guided the evolutions. Master-client paradigm is used to exploit the parallelism in the sequence transformation analysis. Load balancing and fault-tolerance techniques are developed to enable the execution of the explorations on grid resources. The analysis of the sequence transformations is used to resolve uncertainties associated with the phylogenetic trees namely, intermediate sequences in the phylogenetic tree and the exact number of time-steps required for the evolution of a branch. The model is further used to find out various statistics such as most popular rules at a particular time-step in the evolution history of a branch in a phylogenetic tree. We have observed some interesting statistics regarding the unknown base pairs in the intermediate sequences of the phylogenetic tree and the most popular rules used for sequence transformations.
Next part of the thesis deals with predictions of future sequences using the previous sequences. First, we try to find out the preserved sequences so that cellular automata rules can be applied selectively. Then, random strategies are developed as base benchmarks. Roulette Wheel strategy is used for predicting future DNA sequences. Though the prediction strategies are able to better the random benchmarks in most of the cases, average performance improvement over the random strategies is not significant. The possible reasons are discussed.