Use of strategically designed protein-like sequences in structure and function recognition

Kumar, Gayatri

View/Open

Thesis full text (81.53Mb)

Author

Kumar, Gayatri

Metadata

Show full item record

Abstract

The advent of high fidelity protein sequencing techniques has led to a considerable wealth of sequence data. However, the number of proteins with information on 3-D structure and functional features available is considerably lower. In spite of improvements in structural and functional genomics initiatives, most experimental procedures in use are time consuming. This has led to a formidable gap between the sequence and structure space which continues to increase. The structural coverage of the proteome of most organisms is not complete and limits the information available on function and the implied biological roles. Computational approaches could provide preliminary ideas on the structure and function of proteins. Protein structures are far more conserved than sequences as a consequence of the evolutionary pressure to maintain the structure and thereby its function. Therefore, recognition of evolutionary relationships among proteins could serve as an important step towards inferences on shared structural and functional features between related proteins. Detailed comparative analysis of evolutionarily related proteins could provide clues to protein structure and consequently its function. However, a notorious problem is detection of relationship between proteins characterized by low sequence similarity (less than about 20%) as unrelated proteins too share poor sequence similarity. The detection of relatedness between sequentially distant proteins serves as a nodal point in structure and function recognition. Hence, most sequence search algorithms rely on deriving these non-trivial relationships between distant homologues to further functional annotation. It has been observed that the limitation in identifying distant relatives is due to the sparseness of the protein sequence space. i.e., if sequences intermediately related to the two proteins (or two protein families) are unavailable, then the recognition of such relationships purely using sequence data becomes challenging. The paucity of natural intermediate sequences to direct profile or sequence search methods undermines even rigorous and powerful search algorithms. In a protocol developed earlier in the group, protein-like sequences, referred as offsprings, were computationally designed using the sequence profiles of domain family pairs, referred as parents, which are known to be distantly related. It has been shown that these sequences served as stepping stones for search methods to link distant relatives. Plugging these intermediately related sequences, into the database of natural protein sequences addressed the challenges of the void and sparse regions of the protein sequence space. Use of designed sequences showed a marked improvement in structural fold coverage and augmented the ability of search protocols. Therefore, use of designed sequences in homology detection could enable recognition of structure and function of proteins not known so far. The questions raised in this thesis starts with exploring the foldability of the designed sequences into the parent structural fold. Having seen that these designed proteins are likely to adopt the structural fold of parent families, they were employed in recognizing the structure of protein families which do not possess any information on structure yet. Further, an improvement in the approach was put forth to make homology driven searches faster and more sensitive by representing the sequences, both natural and designed, as hidden Markov models. The use of intermediately related artificial sequences in probing functional relationships between protein families was explored. The associations made through designed sequences were examined for identifying biological relevance by exploring the conservation of putative functional residues. To strengthen the ability of the designed intermediates in homology detection, the artificial expansion of the protein space around protein families was carried out.

URI

https://etd.iisc.ac.in/handle/2005/4968

Collections

Molecular Biophysics Unit (MBU) [326]