Computational Protein Structure Analysis : Kernel And Spectral Methods
Abstract
The focus of this thesis is to develop computational techniques for analysis of protein structures. We model protein structures as points in 3-dimensional space which in turn are modeled as weighted graphs. The problem of protein structure comparison is posed as a weighted graph matching problem and an algorithm motivated from the spectral graph matching techniques is developed. The thesis also proposes novel similarity measures by deriving kernel functions. These kernel functions allow the data to be mapped to a suitably defined Reproducing kernel Hilbert Space(RKHS), paving the way for efficient algorithms for protein structure classification.
Protein structure comparison (structure alignment)is a classical method of determining overall similarity between two protein structures. This problem can be posed as the approximate weighted subgraph matching problem, which is a well known NP-Hard problem. Spectral graph matching techniques provide efficient heuristic solution for the weighted graph matching problem using eigenvectors of adjacency matrices of the graphs. We propose a novel and efficient algorithm for protein structure comparison using the notion of neighborhood preserving projections (NPP) motivated from spectral graph matching. Empirically, we demonstrate that comparing the NPPs of two protein structures gives the correct equivalences when the sizes of proteins being compared are roughly similar. Also, the resulting algorithm is 3 -20 times faster than the existing state of the art techniques. This algorithm was used for retrieval of protein structures from standard databases with accuracies comparable to the state of the art.
A limitation of the above method is that it gives wrong results when the number of unmatched residues, also called insertions and deletions (indels), are very high. This problem was tackled by matching neighborhoods, rather than entire structures. For each pair of neighborhoods, we grow the neighborhood alignments to get alignments for entire structures. This results in a robust method that has outperformed the existing state of the art methods on standard benchmark datasets. This method was also implemented using MPI on a cluster for database search.
Another important problem in computational biology is classification of protein structures into classes exhibiting high structural similarity. Many manual and semi-automatic structural classification databases exist. Kernel methods along with support vector machines (SVM) have proved to be a robust and principled tool for classification. We have proposed novel positive semidefinite kernel functions on protein structures based on spatial neighborhoods. The kernels were derived using a general technique called convolution kernel, and showed to be related to the spectral alignment score in a limiting case. These kernels have outperformed the existing tools when validated on a well known manual classification scheme called SCOP. The kernels were designed keeping the general problem of capturing structural similarity in mind, and have been successfully applied to problems in other domains, e.g. computer vision.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Structural And Evolutionary Studies On Protein-Protein Interactions
Swapna, L S (2014-05-27)The last few decades have witnessed an upsurge in the availability of large-scale data on genomes and genome-scale information. The development of methods to understand the trends and patterns from large scale data promised ... -
Algorithmic Approaches For Protein-Protein Docking And quarternary Structure Inference
Mitra, Pralay (2011-02-14)Molecular interaction among proteins drives the cellular processes through the formation of complexes that perform the requisite biochemical function. While some of the complexes are obligate (i.e., they fold together while ... -
Probing Ligand Induced Perturbations In Protien Structure Networks : Physico-Chemical Insights From MD Simulations And Graph Theory
Bhattacharyya, Moitrayee (2014-07-16)The fidelity of biological processes and reactions, inspite of the widespread diversity, is programmed by highly specific physico-chemical principles. This underlines our basic understanding of different interesting phenomena ...