dc.description.abstract | A signal transduction process refers to chain of highly regulated biochemical steps which results in the transfer of signal in response to a stimulus in the extracellular environment to the intracellular compartments such as nucleus. Variety of biomolecules such as proteins and lipids participate in such processes. One of the superfamilies of proteins which actively participate in signaling processes is protein kinase which transfers γ-phosphate from Adenosine Triphosphate (ATP) to the specific hydroxyl group(s) in the protein substrates. Phosphorylation and dephosphorylation events are critical in many signal transduction pathways affecting biological system as a whole. Protein phosphorylation carried out by protein kinases has emerged as pre-eminent mechanism for the regulation of variety of cellular processes such as cell growth, development, differentiation, homeostasis, apoptosis, metabolism, transcription and translation.
The current thesis encompasses the investigations carried out by the author, using various bioinformatics tools and methods, to comprehend the structural and functional roles of diverse set of protein kinase subfamilies in various eukaryotic and prokaryotic organisms. The present thesis has been divided into various chapters. Chapter 1 of the thesis provides introduction to the superfamily of protein kinases and covers the relevant literature. The database of Kinases in Genomes (KinG) set-up in the author’s group a few years ago (Krupa et al, 2004a), comprises of a collection of Serine/Threonine/Tyrosine protein kinases recognized using bioinformatics approaches, from the genomic data of various eukaryotes, prokaryotes and viruses (Krupa et al, 2004a). KinG database also provides classification of protein kinases into various groups and subfamilies (Hanks et al, 1988). Information on non-kinase domains which are tethered to the catalytic kinase domains is also available for every kinase in the KinG database. KinG is periodically (annually) updated with rise in the number of genome sequence datasets of various organisms, increase in the number of known protein domain families and refinement or reannotation of genomic datasets (Anamika et al, 2008c). Author describes the work on annual update of KinG database in Chapter 2 of the thesis. Availability of an improved version of the human genomic data has provided an opportunity to re-investigate protein kinase complement of the human genome and enabled an analysis of the splice variants. This analysis is also described in Chapter 2. Chapter 3, Chapter 4 and Chapter 5 report recognition and analysis of repertoire of protein kinases in Chimpanzee, two Plasmodium species (Plasmodium falciparum and Plasmodium yoelii yoelli) and Entamoeba histolytica respectively. A detailed analysis of the non-kinase domains which are tethered to catalytic protein kinase domains in eukaryotic organisms is presented in Chapter 6. Chapter 7 discusses a systematic classification framework developed by the author to classify Serine/Threonine protein kinases in prokaryotic organisms. Investigation carried out on 3-D structural aspects of protein kinase-substrate interactions is described in Chapter 8. While identifying protein kinases from genomic data occurrence of protein kinase-like non-kinases (PKLNK), which lack aspartate in a specific position in the amino acid sequence (and hence are unlikely to function as a kinase), has also been observed. Chapter 9 presents an analysis of PKLNKs with an objective of obtaining clues to their functions. Chapter 10 summarizes the main conclusions of the thesis and provides an outlook of the current study.
Chapter 1: Chapter 1 provides an introduction to cell signaling and the involvement of protein kinases in various signaling pathways compiled from author’s literature survey. This chapter provides a description of molecular events in cell signaling in prokaryotic and eukaryotic organisms. The diversity, specificity and cellular roles of protein kinases are discussed in detail.
Chapter 2: Chapter 2 describes KinG (Kinases in Genomes) database which was first established by Krupa et al (2004a). The KinG database is an on-line compilation of the putative Serine/Threonine/Tyrosine protein kinases encoded in the completely sequenced genomes of archaea, eubacteria, viruses and eukaryotes. Surge in the datasets of genomes, improvements in the quality of the genomic data for various organisms and growing number of protein domain families necessitates periodic update of KinG database. The updated version of KinG holds information on protein kinases for 483 organisms (Anamika et al, 2008c). Availability of draft version of the human genome data in 2001 enabled recognition of repertoire of human protein kinases (Krupa and Srinivasan, 2002a; Manning et al, 2002; Kostich et al, 2002). Over the last 7 years human genomic data is being refined and at present the quality of the human genomic data available is much superior to the one available in 2001. By gleaning the latest version of human genome data, 46 new human protein kinase splice variants have been identified which were not recognized in the earlier studies on human kinome. Improper regulation or mutant forms of many of these newly identified protein kinase splice variants are directly involved in various diseases such as different kinds of cancer, Severe Combined Immunodeficiency Disease (SCID) and Huntington disease. In addition, abnormal forms of mouse orthologues of some of the newly identified human kinase splice variants are known to cause various diseases in mice. This raises the possibility of the human orthologues playing similar roles in the disease processes. Such observations and detailed analysis of these protein kinase splice variants would have a profound influence on drug design and development against various diseases.
Chapter 3: Investigations on the identification and analysis of protein kinases encoded in the genome of chimpanzee (chimp) has been discussed in Chapter 3. Further, the kinome complement has been compared between chimp and its evolutionary close relative, human (Anamika et al, 2008b). The shared core biology between chimp and human is characterized by many orthologous protein kinases which are involved in conserved pathways. Domain architectures specific to chimp/human kinases have been observed. Chimp kinases with unique domain architectures are characterized by deletion of one or more non-kinase domains present in the human kinases. Interestingly, counterparts of some of the multi-domain human kinases in chimp are characterized by identical domain architectures but with kinase-like non-kinase domain (PKLNK). Remarkably, for 160 out of 587 chimpanzee kinases no human orthologue with sequence identity greater than 95% could be identified. Variations in chimpanzee kinases compared to human kinases are brought about also by differences in functions of domains tethered to the catalytic kinase domain. For example, the heterodimer forming PB1 domain related to the fold of ubiquitin / Ras-binding domain is seen uniquely tethered to PKC-like chimpanzee kinase. Though chimpanzee and human have close evolutionary relationship, there are chimpanzee kinases with no close counterpart in the human suggesting differences in their functions. This chapter provides a direction for experimental analysis of human and chimpanzee protein kinases in order to enhance our understanding on their specific biological roles.
Chapter 4: Chapter 4 describes genome-wide comparative analysis for protein kinases encoded in the two apicomplexa namely Plasmodium falciparum (P. falciparum) (3D7 strain) and Plasmodium yoelii yoelii (P. yoelii yoelii) (17XNL strain) genomes which are causative agents of malaria in human and rodent respectively (Anamika and Srinivasan, 2007). Sensitive bioinformatics techniques enable identification of 82 and 60 putative protein kinases in P. falciparum and P. yoelii yoelii respectively. These protein kinases have been classified further into subfamilies based on the extent of sequence similarity of their catalytic domains (Hanks et al, 1988). The most populated kinase subfamilies in both the Plasmodium species correspond to CAMK and CMGC groups. Analysis of domain architectures enables detection of uncommon domain organisation in kinases of both the organisms such as kinase domain tethered to EF hands as well as pleckstrin homology domain. Components of MAPK signaling pathway are not well conserved in Plasmodium species. Such observations suggest that Plasmodium protein kinases are highly divergent from other eukaryotes. A trans-membrane kinase with 6 membrane spanning segments in P. falciparum seems to have no orthologue in P. yoelii yoelii. 19 P. falciparum kinases (Anamika et al, 2005; Anamika and Srinivasan, 2007) have been found to cluster separately from P. yoelii yoelii kinases and hence these kinases are unique to P. falciparum genome. Only 28 orthologous pairs of kinases could be identified between these two Plasmodium species. Comparative kinome analysis of the two Plasmodium species has thus provided clues to the function of many protein kinases based upon their classification and domain organisation and also implicate marked differences even between two Plasmodium species.
Chapter 5: Identification and analysis of the repertoire of protein kinases in the intracellular parasite Entamoeba histolytica (E. histolytica) using sensitive sequence and profile search methods forms the basis of Chapter 5. A systematic analysis of a set of 307 protein kinases in E. histolytica genome has been carried out by classifying them into different subfamilies originally defined by Hanks and Hunter (Hanks et al, 1988) and by examining the functional domains which are tethered to the catalytic kinase domains (Anamika et al, 2008a). Compared to other eukaryotic organisms, protein kinases from E. histolytica vary in terms of their domain organisation and displays features that may have a bearing in the unusual biology of this organism. Some of the parasitic kinases show high sequence similarity in the catalytic domain region with calmodulin/calcium dependent protein kinase subfamily. However, they are unlikely to act like calcium/calmodulin dependent kinases as they lack non-catalytic domains characteristic of such kinases in other organisms. Such kinases form the largest subfamily of protein kinases in E. histolytica. Interestingly a Protein Kinase A/Protein Kinase G-like hybrid kinase subfamily member is tethered to pleckstrin homology domain. Although potential cyclins and cyclin-dependent kinases could be identified in the genome the likely absence of other cell cycle proteins suggests unusual nature of cell cycle in E. histolytica. Some of the unusual kinases recognized in the analysis include the absence of Mitogen activated protein kinase kinase (MEK) as a part of the Mitogen Activated Kinase signaling pathway and identification of trans-membraneous kinases with catalytic kinase region showing a good sequence similarity to Src kinases which are usually cytosolic. Sequences which could not be classified into known subfamilies of protein kinases have unusual domain architectures. Many such unclassified protein kinases are tethered to domains which are cysteine-rich and to domains known to be involved in protein-protein interactions. The current chapter on kinome analysis of E. histolytica suggests that the organism possesses a complex protein phosphorylation network that involves many unusual protein kinases.
Chapter 6: Protein kinases phosphorylating Serine/Threonine/Tyrosine residues in several cellular proteins exert tight control over their biological functions. They constitute the largest protein family in most eukaryotic species. Classification based on sequence similarity of their catalytic domains, results in clustering of protein kinases sharing gross functional properties into various subfamilies. Many protein kinases are associated or tethered covalently to domains that serve as adapter or regulatory modules, aiding substrate recruitment, specificity, and also serve as scaffolds. Hence the modular organisation of the protein kinases serves as guidelines to their molecular interaction which has been discussed in Chapter 6. Recent studies on repertoires of protein kinases in eukaryotes have revealed wide spectrum of domain organisation in model organisms across various subfamilies. Occurrence of organism-specific novel domain combinations suggests functional diversity achieved by the protein kinase in order to regulate variety of biological processes. In addition, domain architectures of protein kinases revealed existence of hybrid protein kinase subfamilies and their emerging roles in the signaling of eukaryotic organisms. The repertoire of non-kinase domains tethered to multi-domain kinases in the higher eukaryotes is discussed in Chapter 6. Similarities and differences in the domain architectures of protein kinases in these organisms indicate conserved and unique features that are critical to functional specialization.
Chapter 7: Chapter 7 describes systematic classification of Serine/Threonine protein kinases encoded in archaeal and eubacterial genomes. Majority of the Serine/Threonine protein kinases which have been identified in archaeal and eubacterial genomes could not be classified into any of the well known subfamilies (Hanks et al, 1988) of protein kinases suggesting their diversity from kinases in eukaryotes. The extensive prokaryotic Serine/Threonine protein kinase dataset obtained from KinG (Krupa et al, 2004a, Anamika et al, 2008c) has given an opportunity to classify these prokaryotic Serine/Threonine protein kinases mainly into three categories based upon sequence identity based clustering: 1) Species/Genus-specific clusters: Species/Genus-specific Serine/Threonine protein kinases contain members from a particular species or genus of the eubacteria or archaea suggesting requirement of these Serine/Threonine protein kinases for certain lineage specific function. 2) Organism-specific clusters: Organism specific clusters has members from certain specific types of organisms which suggests role of these Serine/Threonine protein kinases in some specific function being carried out by limited sets of prokaryotes. 3) Organism-diverse clusters: Organism diverse clusters suggest common function performed by such kinases in wide variety of organisms.
Interestingly, occurrence of several species/genus or organism specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Function-based classification has also been proposed which shows that members of each cluster has specific function to perform. In this analysis, almost 50% of the “clusters” obtained have only one member suggesting their sequence and probably functional divergence. Many prokaryotic Serine/Threonine protein kinases exhibit a wide variety of modular organisation which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes.
Chapter 8: A wide spectrum of protein kinases belonging to different Hanks and Hunter groups of kinases and subfamilies has been identified in various eukaryotes. However, specific biological targets (substrates) of many protein kinase subfamilies are still unknown and this is one of the active areas of research. In the current analysis reported in Chapter 8, an attempt has been made to understand protein kinase-substrate interaction and substrate consensus prediction by analyzing known 3-D structures of complexes of kinases and peptide substrates/pseudosubstrates. Considering protein kinase ternary complex structures in their active states, it has been observed that protein kinase residues which are interacting with the substrate residues having constraint are at topologically equivalent positions despite belonging to different Hanks and Hunter protein kinase subfamilies. In this analysis, it has also been observed that the residues in a given kinase subfamily interacting with consensus substrate residues are usually conserved across homologues. Interestingly, in Protein Kinase B and Phosphorylase Kinase subfamily homologues, residues interacting with substrate residue/s having no constraint are not well conserved even within the kinase subfamily suggesting different evolutionary rate of substrate interacting residues. This result is anticipated to be helpful in furthering our understanding of protein kinase-substrate relationship which is likely to be helpful in drug design.
Chapter 9: Protein Kinase-Like Non-kinases (PKLNKs) are closely related to protein kinases but they lack the crucial catalytic aspartate in the catalytic loop and hence cannot function as a protein kinase. PKLNKs have been analyzed (Chapter 9) with an objective of obtaining clues about their functions. Using various sensitive sequence analysis methods, 82 PKLNKs from four higher eukaryotic organisms namely, Homo sapiens, Mus Musculus, Rattus norvegicus and Drosophila melanogaster have been recognized. On the basis of their domain combinations and functions of tethered domains, PKLNKs have been classified mainly into four categories: 1) Ligand binding PKLNKs 2) PKLNKs having extracellular protein-protein interaction domain 3) PKLNKs involved in dimerization 4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are entirely cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes and it should be helpful in deciphering their roles in cellular processes.
Chapter 10: This is a chapter on conclusions of the entire thesis work. Summary of the major outcomes of this thesis work is provided and implications of the work in the area of signal transduction are discussed.
In addition to above mentioned work, studies on repertoire of protein kinases from two plant organisms have been carried out and the kinomes have been comparatively analyzed (Krupa et al, 2006) (Appendix 1). Comparison of plant protein kinases with other eukaryotes revealed remarkable differences. Trans-genomic comparison of the protein kinase repertoires of Arabidopsis thaliana and Oryza sativa has enabled identification of members that are uniquely conserved within the two species. Analysis on the domain organisation of plant protein kinases has also been carried out.
Appendix 2 presents the work done on Entamoeba histolytica (E. histolytica) ornithine decarboxylase (ODC)-like protein which regulates the polyamine biosynthesis. DFMO (Difluoromethylornithine) is unable to inhibit the E. histolytica ODC-like protein while it inhibits the homologues of ODC in other organisms. Modelling study has suggested substitution of three amino acids in the E. histolytica ODC-like protein because of which DFMO is unable to inhibit the activity of ODC-like protein (Jhingran et al, 2008). All the computational modeling work reported in Appendix 2 was performed by the author while all the laboratory experiments were performed in the laboratory of the collaborator Prof. Madhubala of JNU, New Delhi.
The supplementary data pertaining to this thesis is presented in an accompanying CD. The supplementary data in this CD is organized into different folders corresponding to various chapters. | en |