dc.description.abstract | Cellular response to environmental changes involves a wide repertoire of complex signalling systems often resulting in up and down regulation of various genes. These mechanisms are generally conserved in a variety of organisms. These pathways are also constantly rewired in various organisms, which aid them in maintaining homeostasis and result in species-specific adaptation mechanisms. Protein kinases are central to these mechanisms and orchestrate a multitude of these pathways. This thesis aims to understand the selective forces behind evolution of signalling pathways. More specifically, this thesis focuses on structural and domain architecture differences of protein kinases. Protein kinases are one of the most populated families of proteins in many organisms and it constitutes about 2-3% of proteomes of most of the eukaryotic organisms. These kinases have evolved over ~400 million years and regulate nearly all major signalling pathways. Classification of kinases enables convenient association of kinases to the function and signalling pathway in which they participate. The current scheme of classification is based on the amino acid sequence of the catalytic region, which consists of about 200-300 residues. This scheme proposes division into 7 groups which show gross level similarities in function such as the TK group, which constitutes all tyrosine kinases, or AGC group which constitutes kinases regulated by second messengers. These groups are further divided into ~280 subfamilies providing us insights into function and regulation at a much finer level. This enables ascertaining information about signalling pathways, protein-protein interactions or substrates the kinase phosphorylates.
Chapter 1 provides an elaborate introduction to the various types of protein kinases and their roles in signalling processes. This chapter discusses how protein kinases work in a concerted manner with several other players of a signalling pathway to generate a regulated response to external stimuli. Furthermore, it highlights both the evolutionary aspects and dynamical nature of such pathways. The subsequent part of this chapter deals with protein kinases, their evolution, regulation and structural features crucial to catalysis. Protein kinases are regulated in many ways ¬regulation is achieved from within the catalytic domain and also by means of additional domains tethered to the catalytic domain. The regulatory switch is triggered by various cellular and molecular events such as phosphorylation of specific residues, changes in spatial-temporal localization and altered redox states to name a few. The effects of regulatory domains on the overall function have also been discussed. The chapter concludes by highlighting structural analysis carried out to understand the regulatory aspect of kinases and uses this information in rational drug discovery.
Chapters 2 and 3 report identification and analysis of a repertoire of protein kinases encoded in the genomes of two of the organisms which are frequently used in comparative genomics. Chapter 2 focuses on the distribution of kinases in Takifugu rubripes, a teleost fish which is a widely used model system for studying human genes. Use of remote homology detection methods identified 519 kinases in fugu. Although the group-wise distribution of kinases shows high similarity to that of human kinases, subfamily distribution shows considerable differences in 22 subfamilies. They are either under or over-represented in fugu. Most noticeable difference is seen for the DYRK subfamily, which is eight times higher in fugu than human. Detailed analysis of the DYRKs revealed interesting insights into and explained partially their high representation in fugu. Only about ten of these kinases classified into these subfamilies showed high sequence similarity and conserved localization signals to the human kinases and kinases commonly found in other eukaryotes such as C.elegans, S.cereviseae and D.melanogaster. Disparity at the level of genome may be attributed to the observation of unique domain architectures characteristic of this genome. A comparison of domain architectures of kinases documented in Pfam with that of the kinases in Takifugu also revealed two kinases with unique domain architectures in fugu; they are associated with Galectin domain and YkyA domains. Despite inconsistencies in the distribution, human and Takifugu kinases subfamilies remarkable similarity is observed in the MAP kinase pathway, which is ubiquitously found across eukaryotic organisms. Nearly 83% of the proteins in this pathway show more than 30% sequence identity between the two organisms thus, validating the use of Takifugu as a model system to study human signalling pathways.
While addressing the possibilities of similar expansions of kinases in other teleosts, it was noticed that the Danio rerio genome (zebrafish) had a massively expanded kinome with ~1200 kinases. Chapter 3 explores the possible reasons for the expansion of kinome with kinases specific to Zebrafish. For e.g., the number of kinases from one subfamily (CAMK) is roughly similar to the total number of protein kinases encoded in the human genome. Further, the PIM kinase subfamily is the sole subfamily, which is massively over-represented (~30 times) in this genome. A detailed analysis of PIM kinases of zebrafish revealed that the sequences are divergent from the canonical PIM kinases. Despite this difference, the specific residues, which dictate the functional properties specific to PIM kinases, are highly conserved. These PIM kinases are usually constitutively active, features of which are conserved in PIM kinases of zebrafish as well. Unlike canonical PIM kinases in other eukaryotes, the post-transcriptional regulation of these PIM kinases might be different due to the absence of regulatory regions in the 3'UTR regions of the PIM gene. However, conservation of a S261 phosphorylation site highlights regulation by phosphorylation, which compensates for the constitutively active nature. A massive expansion of the substrate pool of PIM kinases in this genome seems to correlate well with the expansion. Since PIM kinases regulate large number of growth related pathways, we believe that, this might be associated with high regenerative capacity of organs observed in this fish, which makes it an ideal model to study most cancers.
While the earlier two chapters primarily focused on the kinase catalytic domain and organism specific changes; the next two chapters address the contribution of domains tethered to the catalytic domain in the overall function of the kinase. Deviations from canonical kinase domain architectures indicate expansion in the functional repertoire of kinases. Chapter 4 is a study on human kinases from the latest revised version of the human genome sequence data. The initial part of the chapter focuses on the differences in the kinase repertoire upon revision of the human genomic data. Seven sequences gleaned from the earlier genomic data are absent and 16 new sequences are added to the kinome dataset according to the latest human genome sequence data. In addition, differences in transcripts for 23 kinases have led to differences in overall length and sub-family classification of these kinases. The identification of the kinome data from this latest version was a mandatory step prior to the study of outlier kinases due to variations in gene transcripts. The domain architectures of the human kinases have been compared with known subfamily-specific domain architectures, in order to identify outliers. Based on the type of domain architecture these outliers have been classified as “rogue” or “hybrid” kinases. Hybrid architecture represent kinases showing high sequence similarity within the kinase domain to a known sub¬family of kinases with the acquisition of non-kinase domains typically found in one of the other subfamilies of kinases. On the other hand rogue architectures belong to kinases with domain architectures not observed in any of the kinase sub-families. A total of 23 outliers have been identified in the human genome-13 hybrids and 10 rogues. The presence of such "hybrid" and "rogue" kinases makes classification of kinases into subfamilies a daunting task and hence necessitates a new method for classification using the full-length sequences. The use of one such alignment-free method, ClaP (Appendix), using full length sequences has been validated for classification of kinases. A similarity metric obtained from full protein sequence comparison further improved the existing methods of classification for 29 kinases, which utilize only the catalytic domain of kinases. Classification based on catalytic domain is incomplete without the knowledge of associated domains, which also have an important role in function. This necessitates a new approach in classification of kinases for function annotation-an integrated one that uses information from the full-length sequence of each kinase.
Chapter 5 extends the learning from chapter 4 and aids in identification of 74 "Hybrid" and 18 "Rogue" kinases in other model eukaryotes, Mus musculus, C.elegans, S. Cerevisiae, D. melanogaster and Takifugu rubripes which show significant variations in the overall functions. These sequences due to their hybrid nature might facilitate cross-talk between signalling pathways. Thus annotating the function of each of these 92 outliers has highlighted the use of domain recombination in wiring new pathways and re-wiring existing pathways. Also, these sequences because of their hybrid nature cannot be classified under any of the existing sub-families. Therefore, it has been proposed in this chapter that they be classified as separate sub-family containing sequences with hybrid properties. To validate this, the ClaP method has been extended where the pair-wise distances between two sequences (using full length sequence) has been used to generate phylogenetic trees which have then been subjected to hierarchical clustering to generate sub-family based clusters. Further, a Shannon entropy based score has been used to identify clusters that contain sequences from diverse sub-families grouped together. Upon analysis of these clusters, it was observed that the hybrid and rogue kinases specifically cluster within four clusters with high entropy (constitute large number of sub-families) validating their status as emergent sub-families. In addition, more hybrids and rogues have been identified in these clusters, which have long regions without any domain assignments. Such sequences may contain domain families deviant from those that are currently known and information on their function can be obtained from further genomic studies in future. Lastly, the prevalence of such hybrid and rogue kinases in the genome of a protozoan parasite, P. falciparum has been studied in detail. The role of hybrids and rogues in host-pathogen interaction has been explored.
Chapter 6 presents an in-depth analysis of the possible role of charge-neutralization around phosphosites in protein kinases and its substrates. This analysis was a follow up of a study and in collaboration with Dr.Warwicker's group in Manchester, which identified positively charged residues around phosphosites in kinase substrates. The current study not only aims to address the importance of charge neutralization around phosphosites, but also uses this feature for prediction of phosphosites in known structures of kinase substrates. A dataset of phosphosites mapped on a 3-D structure has been used to calculate peak electrostatic potentials around phosphosites based on the solution of a non-linear Poisson-Boltzmann equation. A comparison of peak potentials around phosphosites with that of non-phosphosites reveals a higher positive peak potential at ~10.0 Å radius around the phosphosite. This variation is significantly higher around tyrosine residues in comparison to Ser/Thr residues phosphosites. Further, this distinction in peak potential around the phosphosite is attributed to only certain families like protein kinases and pyruvate kinases. The concept of charge neutralization will therefore show greater success in prediction of phosphosites in such families in comparison to other families with phosphosites. The functional importance of such charge neutralizations has been studied in great detail in the protein kinase domain family due to prior knowledge that certain phosphorylation events contribute to conformational change, which may be correlated to the changes in peak potentials upon phosphorylation. Phosphorylation at certain sites within the kinase catalytic domain often mediates onset of certain signalling events including regulating activity levels of kinases, mediating protein-protein interactions and altering their localization. Therefore, by means of studying conservation patterns of such phosphosites or neutralizing residues, the variations in signalling pathways in homologues with differences in conservation patterns, have been highlighted. Among domain families which do not show clear differences in peak potentials between phosphosites and non-phosphosites, it was noted, in a few cases, that negatively charged ligands bind to the protein in the vicinity of phosphosites, in the un-phosphorylated forms of the protein. Structural studies on a few cases in ligand bound forms indicate a competitive mechanism between phosphorylation and ligand binding which helps in switching between different functional forms. Therefore, the role of phosphorylation as a regulatory mechanism for modulating ligand binding in such domain families has been highlighted.
Chapter 7 of the thesis reports a study on disease causing mutations in kinases. So far 180 kinases have been reported to contain disease causing mutations. This chapter particularly focuses on understanding the deleterious effects of non-synonymous missense mutations in kinases. Mutations at certain sites are enriched as seen by the concentration of disease phenotypes upon mutations at these sites in comparison to others. Interactions involving Arginines in sub-domains VIB, VIII, IX and XI are perturbed which affect catalysis. Structural explanation of 10 such mutations, which occur in important sub-domains and not directly implicated in catalysis has been provided.
Apart from analyzing the various evolutionary and structural aspects of protein kinases in this thesis an attempt has been made to provide a deeper structural understanding of Msh (MutS Homologues) proteins involved in eukaryotic chromosomal segregation. Chapter 8 deals with Msh4-Msh5 complex, which are eukaryotic homologues of the MutS family of proteins in bacteria. MutS proteins form homodimeric complexes in bacteria that aid in mismatch repair process. There are six MutS homologues in eukaryotes, which form hetero-dimers. Two of the homologues are Msh4 and Msh5, which form hetero-dimeric complexes which is a pre-requisite for its function. They are involved in chromosomal segregation during meiosis-I and aid in resolving Holliday junction DNA. Till date no structure of this complex is available and the exact mode of binding is unclear. In addition, Msh4 and Msh5 display asymmetry in DNA and ATP binding sites. These insights are derived from the severity in phenotypes upon mutation of various residues in these proteins. This work is in collaboration with Dr. Nishant from IISER, Trivandrum. The questions addressed in chapter 8 of the thesis are: What are the structural features that contribute to the asymmetry in function between Msh4 and Msh5 in DNA and ATP binding? Can a structural explanation be provided for each of the 27 mutations causing severe phenotypes (cross-over defects/viability) to predict their role in function of the Msh4-Msh5 complex? Can a prediction be provided for the mode of binding of the Holliday junction DNA? Can residues occurring at interface regions of Msh4 and Msh5 be identified on the basis of the structure which affects the complexation of Msh4 and Msh5? These questions are addressed by homology modelling of the Msh4-Msh5 complex using the Msh2-Msh6 complex as template. Structural explanations have been provided for 23 out of 27 mutations with severe phenotypes. Certain residues in Msh5 are shown to form tighter network of interactions than their counterparts in Msh4 and therefore likely to have a more prominent role in DNA and ATP binding which corroborate with the observed asymmetry in mutant functions. A volume based calculation has been used to suggest a possible mode of binding of the Holliday junction within the cavity of the complex. Finally, the model has been used to predict interface residues that play a crucial role in complexation and function. Experiments are being carried out in Dr. Nishant's laboratory to mutate these residues to validate the model.
Chapter 9 summarizes the entire thesis work and also clearly states the chief conclusions from various chapters.
Apart from studies embodied in the thesis, the author has been involved in one other study, which is provided as appendix. | en_US |