|dc.description.abstract||The last few decades have witnessed an upsurge in the availability of large-scale data on genomes and genome-scale information. The development of methods to understand the trends and patterns from large scale data promised potentially to unravel the mechanisms responsible for the enormous diversity observed in biological systems. Of the many mechanisms adopted, protein-protein interactions represent one of the commonly adopted mechanisms to achieve functional diversity using a limited genetic repertoire. Protein-proteins interactions bring about several fundamental cellular processes and also modulate regulation at the cellular level. Different types of protein-protein interactions have evolved to carry out myriad functions in a cell. Mainly, interactions can be permanent or transient in nature, depending on the duration of interaction. In terms of affinity ,they are classified as obligate or non-obligate interactions. Structural studies on the various kinds of complexes have enabled the identification of the distinctive features characterizing the different types of complexes. Further, identifying the mechanisms involved in the evolution of protein-protein interactions are important in understanding the forces involved in their maintenance. Such studies also provide clues for the development of methods to predict protein-protein interactions from the large repertoire of sequence and structural data available.
In spite of significant understanding of various aspects of protein-protein interactions, several questions still remain unanswered. The work embodied in this thesis studies two main aspects of protein-protein interactions for various types of complexes: structural and evolutionary features. The first part of the thesis(comprising of chapters 2,3,4 and 5) involves structural studies on the following features of protein-protein interactions: structural change, flexibility, symmetry, and residue conservation. The second part of the thesis(comprising of chapters 6,7,8 and 9) involves study of evolutionary aspects of protein-protein interactions, based on both large-scale data as well as case studies.
Chapter1 provides a background and literature survey of the area of protein-protein interactions. The different classification schemes commonly used for describing the various protein-protein interactions are outlined. The key small-scale and large-scale experimental methods used for the identification of protein-protein interactions are described along with the details of the databases storing such experimentally derived information. Further, a comprehensive account of structural and evolutionary studies performed so far using the available data is provided. The computational(prediction)methods developed to address various aspects of protein-protein interactions are also outlined. In addition, the importance of protein-protein interactions in the context of diseases and the development of methods used to inhibit these interactions are discussed. Finally, the efforts towards design of protein-protein interactions based on the understanding of the principles governing their formation are outlined.
Chapter 2 and chapter 3 describe different aspects of transient protein-protein interactions, which form an important subset of interactions, and are mainly involved in the regulation of Cellular processes. In chapter 2, the structural changes occurring upon complex formation are described. In chapter 3, roles of interface residues in the unbound form are described.
In chapter 2, the nature, extent and location of structural changes upon binding is analyzed using a non-redundant and curated dataset of 77 structures of protein-protein complexes available in both bound and unbound forms. Structural change has been captured using two metrics: protein blocks and root mean square deviation of Cα positions. The relevance of the structural changes observed in protein-protein complexes is established by comparison with a control dataset of proteins not bound to any small or macromolecule. Results indicate that the observed changes are much larger than those observed due to random fluctuations. Given this background, the following observations were made on the nature, extent and location of structural changes in protein-protein complexes.(i) The nature of structural changes occurring at the interface is largely conformational with few rigid-body movements.(ii)The interfaces in the dataset are segregated into three types based on the extent of structural changes at the interface. A significant fraction of the interfaces are ‘pre-made’(almost in variant interface) or‘ induced-fit’(interface with large structural changes), while the rest are interfaces with moderate structural changes(‘others’). Analysis of structural changes using protein blocks reveals that pre-made interfaces are not completely invariant and are characterized by conformational changes of small magnitude. Pre-made interfaces are also observed to bind preferentially to‘ induced fit ’or‘ other’ interfaces. These observations implicate that non-obligate interactions possess in-built regulatory mechanisms in terms of conformational features to control the timely association-dissociation of transiently interacting proteins. (iii) Interestingly, significant structural changes away from the interface were observed in almost one-half of the complexes in the dataset. The analysis of these changes forms a major focus of this chapter. Crystallographic temperature factors, crystal packing, and normal mode analysis of these regions were studied to analyze the structural changes in these regions. Normal mode analysis along with literature survey indicates that most of the structural changes observed in non-interacting surface regions may be functionally relevant, with many of them corresponding to allosteric transitions. The majority of these changes occur in signaling proteins. The chapter summarizes that these observations suggest a much higher prevalence of allostery caused due to protein binding than appreciated before. The data set generated in this chapter can serve as a starting point to uncover potentially new allosteric modulators in signaling systems.
In chapter 3, the question‘ Do residues at the interface of transient protein-protein interactions have any role in the unbound form?’ has been investigated. A high resolution, non-redundant and functionally diverse dataset of 67 proteins with known structures available both in the form of protein-protein complex and unbound forms has been used. Significantly low B-factors at the core of the interface in the unbound form are observed in these structures, indicating high rigidity. Many of these residues also show B-factors comparable to those of buried residues in a protein, which formed the basis for classifying interface residues as ‘rigid’ and‘ non-rigid’. These two types of residues have differential contribution towards the energetics of complex formation. It is also observed that rigid interfacial residues are conserved better in evolution than non-rigid interfacial residues. Their stronger selection is highlighted by substantial conservation of microenvironment (rigidness), sequence(amino acid identity/similarity) and structure(specific side-chain orientation) in homologous proteins. These observations coupled with the absence of any specific type of amino acid to occur preferentially at a rigid site indicates that rigidness is a property of the topological location of the residue at the interface and not the type of the amino acid present at that site. Analysis of the energetic parameters of these residues indicates that the y contribute significantly to the molecular recognition process by reducing the entropic cost on complexation by virtue of their pre-ordered conformation.
This chapter also explores the contribution of interface residues towards the stability of the self-protein vis-à-vis that of the complex. It was seen that most interface residues contribute towards stabilizing the bound form. Interestingly, some of the interfacial residues predominantly stabilize the self-protein(the protein in which they are situated) and have a negligible contribution towards stabilization of the bound form. Thus, though these residues are located in the protein-protein interface their main role seems to be in the stabilization of the self-protein both in the unbound and bound forms. These residues are classified as Self-protein Stabilizing Residues(SSR -6.93%) and the rest as Neutrally Stabilizing Residues(NR -42.60%) and Complex Stabilizing Residues(CSR -50.46%). In addition, it was noted that the proportion of rigid residues is more in SSR(73.33%) than in NR(58.13%) and CSR(48.90%)sites. Apart from the favorable energetic contribution by rigid residues to the free energy of the unbound form than non-rigid residues, their predominance in SSRs suggests that rigid residues play an important role in the stabilization of the unbound form of the protein.The analyses performed in this chapter suggest that not all the protein-protein interfacial residues have the major role of stabilizing the complex; some of these residues seem to have more significant role in the unbound form than the bound form.
Chapter4 provides a discussion on the prevalence and relevance of a symmetry in homodimeric proteins. One of the main features characterizing homodimers is the symmetric arrangement of subunits in the three-dimensional structures.Typically, asymmetric arrangements of subunits are associated with disease states; however, they are also observed in normal homodimers performing specialized functions. Two measures to quantify structural asymmetry in homodimers (global asymmetry and interface asymmetry)have been used on an on-redundant dataset of 223 biologically relevant homodimers. The survey for globally asymmetric homodimers in the dataset indicates that they are very rare(n=11).The chapter discusses cases where a globally asymmetric arrangement of homodimeric proteins has been utilized by the nature to perform certain specialized functions, such as linking of a dimeric system with a monomeric system(half-of-sites reactivity) and the transmission of signals emanating from asymmetric DNA repeats. Analysis of the 3-D structures of homologues reveals that there is no clear conservation of asymmetry. Specifically, the function of the homologous protein appears to dictate the pattern of structural organization. This chapter also describes the structural and evolutionary analyses of the 11 globally asymmetric complexes, which suggest possible mechanisms adopted by nature for preventing infinite array formation. The postulated mechanisms are:(i) In case of homodimers associating via non-topologically equivalent surfaces in their tertiary structures, ligand-dependent mechanisms are used.(ii) In case of homodimers associating via partially topologically equivalent surfaces, steric hindrance serves as the preventive mechanism of infinite array.
Since most of the biologically relevant homodimers exhibit gross structural symmetry, this chapter explores further the extent of interface asymmetry in symmetric homodimers. It was observed that homodimers exhibiting grossly symmetric organization rarely exhibit either perfect local symmetry or high local asymmetry. Further, binding of small ligands at the interface does not cause any significant variation in interface asymmetry.The chapter provides new insights regarding accommodation of structural asymmetry in homodimers.
Chapter 5 describes the ability of residue conservation of interface residues vis-à-vis surface residues near interface residues to identify fitting errors caused due to mis-orientation in cryo-electron microscopy maps. Cryo-electron microscopy is the most popular technique for solving structures of large assemblies in physiological conditions. However, the structures are usually solved at low resolution and atomic resolution is desired to get insights at the molecular-level. Although several methods have been developed for the fitting of atomic structures or models in to low-resolution cryo-electron microscopic maps, inaccurate fitting is observed in several cases. Using a non-redundant and high-resolution dataset of 125 permanent interactions and 95 transient interactions, it was observed that interface residues are significantly conserved better than residues near to the interface. The chapter describes the ability of this differential conservation to identify probable mis-fittings in cryo-EM maps for three case-studies: ribosomal complex from Escherichia coli, transferring-transferrin receptor complex from Homosapiens, and glutamate synthase complex from Azospirillum brasilense. For these cases, the use of conservation information resulted in the identification of a few residues in the vicinity of the interface with significantly higher conservation, implying their probable occurrence in the interface. These findings were verified against the high-resolution structures for two of these complexes (ribosomal assembly and transferring-transferrin receptor complex).These analyses suggest that residue conservation information can be useful in the fitting process to arrive at the fitted structure with an improved accuracy. Further, the discriminative power of the simplistic measure of residue conservation coupled with residue surface accessibility in identifying interacting residues on protein structures is also analyzed in this chapter. Testing on a set of signaling and scaffolding molecules indicates that this simplistic measure can identify interface residues in protein structures, indicating that conservation contains a distinct, although weak, signal for functional regions.
Chapters 6 to 9 discuss studies involving evolutionary aspects of protein-protein interactions. Chapter 6 describes the usage of phylogenetic tree construction using maximum likelihood method to understand the origin of the signal captured by the mirror tree approach. Mirror tree is one of the most popular approaches for identifying interacting proteins based on co-evolution. This method uses the similarity in phylogenetic trees as an indicator of protein-protein interaction. The origin of the evolutionary signal detected by the mirror tree method is a subject of some controversy. Two broad hypotheses have been postulated in the literature to explain the origin of the signal(i)site-specific co-evolution alone and(ii)correlation induced by external factors with only minor, if any, contribution from site-specific co-evolution. In the typical mirror tree protocol, inferences from phylogenetic tree are optional and only genetic distances are analysed. Even if the tree is constructed, usually the Neighbor-Joining approach is used. However ,with Neighbor-Joining method the inferred tree topology and genetic distances are directly linked. With maximum likelihood the tree topology is not derived directly from the genetic distances and therefore the contributions of the signals arising from tree topology and genetic distances can be studied separately. Tree topologies can be considered to serve as indicators of compensatory substitutions(implicated in site-specific co-evolution)as well as shared evolutionary history. Genetic distances correspond to evolutionary rates(implicated in correlation induced by external factors).Using this method, phylogenetic trees for a range of datasets of interacting and non-interacting proteins corresponding to yeast(S.cerevisiae) have been derived. The analysis performed in this chapter reveals no strong correlation between phylogenetic tree topologies, and significant correlation between genetic distance matrices for interacting proteins. The chapter discusses the implications of these findings and attempts to understand the origin of the signal captured by mirror tree protocol using the following points.(i) The near lack of correlation in tree topologies is not surprising since compensatory substitutions accounts for only a minority of the sites in a protein.(ii) The influence of shared evolutionary history has also been tested in the chapter by comparison of tree topologies of interacting proteins and non-interacting with 18S rRNA tree. Tree topologies of both interacting and non-interacting proteins do not mirror the 18S rRNA tree, ruling out shared evolutionary history as the signal of correlated evolution.(iii) By contrast, the significant correlation observed between branch lengths(genetic distances) of interacting proteins in all the variant datasets demonstrates correlation between evolutionary rates, independent of evolutionary divergence. In summary, the chapter concludes by providing support for the theory of correlation induced by external factors with only minor contribution from site-specific co-evolution.
Chapter 7 explores the homology based transfer of interactions by quantifying the extent of retention/variation of interaction partnerships amongst a set of homologous proteins related at SCOP family level(which indicates clear evolutionary relationship).A large dataset of domain-domain interacting pairs(n=20,254)culled from SCOP1.73 was used for this analysis. Study involving this dataset shows that in around~80% of the cases, interacting partners are completely retained(evaluated as proteins belonging to the same SCOP family).If‘common’ partnership is evaluated at the level of SCOP folds, which are known to be structurally similaral though not necessarily evolutionarily related, the percentage of homologous domains with complete retention of partnership increases only by~5%. This indicates that only the presence of a common structural scaffold is not a sufficient feature for interaction. Further, the chapter also describes the retention/variation in partnerships analyzed as a function of sequence divergence between the homologous proteins. It is observed that there is a higher tendency to vary interacting partners as the evolutionary divergence between the homologues increases. In spite of this, interaction partnerships appear to be retained for homologous domains irrespective of their sequence divergence if the function mandates the presence of the interaction. However, all these observations could be influenced by the incomplete nature of information on the interactions available in the structural space. This problem has been addressed in this chapter by studying variation in interaction partnerships for Saccharomyces cerevisiae proteins. Yeast was chosen since it is extensively studied and interactions are available for~87% of proteins yielding a comprehensive list of interactions. To study this aspect, the SCOP dataset of interacting proteins(which represents a generic dataset) was compared with interactions of homologous proteins from yeast. The dataset of interacting proteins for yeast collated from all sources and documented in BIOGRID v50 was used. In this analysis, the proportion of homologous domains showing complete retention of interacting partners was only ~12%. This observation is the reverse of the trend observed for the dataset of homologous SCOP domains. Further analysis of homologous pairs of yeast-SCOP domains, containing only those pairs whose interacting protein families are found both in yeast and SCOP dataset, was performed to ascertain the extent of contribution of organism-specific proteins to the variation in interaction partnership for homologous domains. The proportion of homologous domains showing complete retention of interaction partners increases to~50% for these cases. These observations indicate that organism-specific proteins contribute significantly to the variation of interaction partnerships in homologous proteins.
The next two chapters(8 and 9) discuss two contrasting scenarios of interaction partnerships. Chapter 8 describes the study of two protein families showing variation in interaction partnerships/interface structure and analyzes the drift in protein-protein interaction surfaces in each of the cases. The analysis in this chapter is facilitated by the large number of sequences available for the case studies. The first case study involves members of the glutamine amido transferase (GAT) superfamily of enzymes. Three remote homologues in this superfamily could also be related by sequence: intracellular protease(DJ-1/PfpIfamily),C-terminal domain of the small subunit of carbomoyl phosphate synthetase (ClassI glutamine amidotransferase-like family), and C-terminal domain of catalase (Catalase ,C-terminal domain family).In two cases, it is seen that domain recruitment influences the interacting surface(catalase, carbamoyl phosphate synthetase). The tethered domains, which are involved in interaction with the GAT domain, are from different SCOP folds, indicating that partnerships are not retained at extreme divergence. However, members of the DJ-1/PfpIfamily form homodimers with differing quaternary structures i.e. different orientations of the dimers. Four members have been studied in detail in this chapter (intracellular protease–two distinct interfaces–forming hexamer, stress-induced protein -dimer, DJ-1protein -dimer, sigma cross-reacting protein -dimer). Since the members are sequentially less divergent(as they are within the same family), it is possible to trace the drift in interfaces among these members based on the multiple sequence alignments of members with the differing quaternary structures and the sequences bridging them. The second case study involves analysis of the family of legume lectins, which corresponds to another set of proteins exhibiting differing quaternary structures for remarkably well conserved tertiary structures and sequences.
Analysis of variations in protein-protein interaction surfaces when they show only slight differences between homologous members indicates that the drift is gradual, as seen when tracing the dynamics of DJ-1 family members and legume lectin family members. There exist sequences containing many different intermediate combinations of the interacting residues involved in both the sets of proteins. Comparisons of homologues where an entire interface seems to be lost show a different trend(intracellular protease and DJ-1).The most prominent interacting residues show an abrupt shift between the two different subfamilies. However, inspection of the other interacting residues reveals that there is a gradual change occurring generally, although a drastic change in the important(although quantitatively smaller) residues would have led to loss of interface. In summary, analysis of the evolutionary dynamics of the consensus interface residues of different quaternary structure types of DJ-1/PfpI family of enzymes and legume lectins shows that nature employs only the most important mutations to Prevent a specific interface and form a new interface and the rest of the positions drift and accumulate changes in the course of evolution.
Chapter 9 describes the opposite scenario i.e. conservation of an interface even at high sequence divergence, using the RNA polymerase assembly as a case study. The multi-molecular assembly consists of four core subunits–alpha (I and II), beta, betaprime, and omega. These four subunits are common to RNA polymerase complexes of eubacteria, eukaryota and archaea. The sigma subunit aids in initiation of transcription in eubacteria (cor eenzyme +sigma = holoenzyme). Remarkably, prokaryotic and eukaryotic structures exhibit high degree of structural similarity, although their sequence similarity is low(19-28% sequence identity).However, this is expected as the obligatory interaction between the various subunits is essential to successfully carry out transcription. This chapter investigates the structural accommodation of diverse sequences at the interface of RNA polymerase machinery of eubacteria, using sequence analysis and homology modelling. Analysis of domain composition and order of domains for the core subunits of the RNA polymerase assembly in>85 eubacterial species indicates complete conservation. However, conservation analysis of the various core subunits indicates that the interface residues are more divergent for alpha and omega subunits. Although beta and beta prime are generally well-conserved, the residues involved in interaction with the divergent subunits(i.e.alpha, omega) are not conserved. Insertions/deletions are also observed near the interacting surfaces even in the cases of most conserved subunits(beta and betaprime). The chapter describes the homology modeling of three divergent RNApolymerase complexes from Helicobacter pylori, Mycoplasma pulmonis and Onion yellows phytoplasma, highlighting that insertions/deletions can be accommodated near the interface as they generally occur at the periphery. The development of a generalized matrix capturing preferences of interface environment is documented, along with results comparing the similarity of the modeled interfaces to that of the template interface. It is observed that the modeled interfaces are physico-chemically similar to that of the template interfaces in Thermus thermophilus, indicating that nature accommodates substantial substitutions and insertions/deletions at and near the interface in order to retain the structure of the obligate complex, which is in dispensable for the process of transcription.
The main conclusions of the entire thesis work are summarized in chapter10, which also places the work in the context of the field of protein-protein interactions. The new insights obtained for transient interactions and homodimers from structural studies are highlighted. The application of evolutionary conservation to improve fitting of atomic structures in cryo-electron microscopic maps is discussed. The understanding gained from study of different evolutionary aspects of protein-protein interactions, ranging from correlated evolution to evolutionary dynamics of variations in interactions is also highlighted.
Appendix 1 of this thesis describes the homology modeling of the hexameric form of AAA ATPase domain of spastin along with associated structural analysis.||en_US