Secondary Structures in Proteins : Identification and Analyses
Proteins are large biomolecules consisting of one or more long chains of amino acid residues. They perform a vast array of functions within living organisms. In this thesis, we present analyses of different secondary structural elements (SSEs) in proteins and various methods developed for the same purpose. Using only the geometric parameters, a program for identification of SSEs has been developed, which is more sensitive to the local structural variations. An understanding of the factors that determine the length, geometry as well as location of a particular SSE in the protein is essential to fully appreciate their respective roles in protein structures. The comparative analysis of the geometry of α-helices identified by different programs showed that STRIDE assigned α-helices are more kinked. Conformation of Pro residues in α-helices has also been studied in detail. Several interesting conclusions are drawn from the comprehensive study of π-helices and PolyProline-II (PPII) helices. In the subsequent paragraphs, a brief summary of each chapter is provided. The Introduction (Chapter 1) summarizes the relevant literature, which includes both experimental as well as theoretical studies explaining the structural and functional importance of SSEs in proteins and lays down a suitable background for the subsequent chapters in the thesis. The major questions addressed and the main goals of this thesis are described to set a suitable stage for the detailed discussions. The methodologies involved are discussed in Chapter 2. These include protocol used for preparing non-redundant datasets of protein structures, various statistical methods used to test the significance of position-wise amino acid propensities and different programs used during the course of present investigations. SSEs play an important role in the folding of proteins. However, identification of these SSEs in proteins is a common yet important concern in structural biology. Chapter 3 details a new method ASSP (Assignment of Secondary Structure in Proteins), which uses only the path traversed by the Cα atoms of the consecutive residues. The algorithm is based on the premise that the protein structure can be divided into continuous or uniform stretches, which can be defined in terms of helical parameters and depending on their values, the stretches can be classified into different SSEs, viz. α, 310, π, extended β-strands and PPII and other left handed helices. The methodology was validated using an unbiased clustering of these parameters for a protein dataset containing 1008 protein chains, which advocate that there are seven well defined clusters associated with different SSEs. Apart from α-helices and extended β-strands, 310 and π-helices were also found to occur in considerable numbers. Various analyses demonstrated that the ASSP was able to discriminate the non α-helical segments from flanking α-helices, which were often identified as a part of α-helix by other algorithms. The standalone version of the program for the Linux as well as Windows operating systems is freely downloadable and the web server version is also available at http://nucleix.mbu.iisc.ernet.in/assp/index.html. Among all SSEs in proteins, α-helices are relatively well defined. However, a precise quantitative estimate of their geometrical features and identification of terminal residues is difficult. In Chapter 4, a set of major changes/ updates, implemented in the algorithm of in-house program for analysis of geometry of helices in proteins (HELANAL), has been discussed in detail. It defines the helix parameters based on the path traced by Cα atoms alone and classifies the geometry of the helices into linear, curved, kinked and unassigned type, by fitting the least square 3D line and sphere to the local helix origin points (LHOP). The geometry assigned using HELANAL-Plus is independent of the orientation of the helix in 3D space and also does not depend on the database from which it is taken. The program is made available as a webserver as well as standalone and the helices can be viewed in the JmolApplet along with the best fit helix axis, which makes HELANAL-Plus useful for analysing the inter helix interaction and packing. The utility of the webserver has been increased by incorporating the use of SSE assignment programs like ASSP, DSSP or STRIDE. Pro kinked helices and correlation with the UP and DOWN conformation of Pro were studied in more detail. HELANAL-Plus is available at http://nucleix.mbu.iisc.ernet.in/helanalplus/index.html. Linux/Unix and windows compatible executables are also available for download. The analyses of kinks in a dataset of helices indicated a correlation with the large radius of the cylinder encompassing the residue at which the kink has been observed and many a time ASSP identified that as a π-helix. The detailed analysis of π-helices was limited due to the low frequency of identification by different algorithms. ASSP identified 659 π-helices in 3582 protein chains, solved at resolution ≤ 2.5Å and validated by molprobity. Chapter 5 reports the detailed study of the functional and structural roles of π-helices along with the position-wise amino acid propensity within and around them. These helices were found to range from 5 to 18 residues in length with the average twist and rise being 85.2°±7.2° and 1.28Å±0.31Å respectively. The investigation of π-helices illustrated that they occur mostly in conjunction with α-helices. The majority of π-helices, with flanking α-helices at both termini, were found to be conserved across a large number of structures within a protein family and induce local distortions in the neighbouring α-helices. The presence of a π-helical fragment leads to appropriate orientation as well as positioning of the constituent residues and hence facilitate favourable interactions and also help in proper folding of the protein chain. The comprehensive analyses of position-wise amino acid propensity within and around π-helices showed their unique preferences, which are different from those of α-helices. Additionally and most importantly, the study also brought to light the influence of π-helices on the residue preference in preceding or succeeding α-helices and vice-versa. Study of another important SSE in proteins (Chapter 6), PPII helices, was inspired by their large number of occurrence and initiated with the aim of understanding their structural and functional roles. These helices are defined as an extended, flexible left-handed helix without intra-helical H-bonds and found to occur very frequently. ASSP identifies 3597 PPII helices in 3582 protein chains. Though PPII helices occur on a much smaller scale than α-helices and β-strands, their sheer number is still more than that of π-helices. The analyses of PPII-helices revealed that almost 50% of the total helices do not contain Pro residues and show a preference for polar residues. PPII-helices were found in conjunction with major SSEs and they often connect them. These helices range from 3 to 13 residues in length with the average twist and rise being -121.2°±9.2° and 3.0Å±0.1Å respectively. The analysis of various non-bonded interactions revealed the frequent presence of C-H…N and C-H…O non-bonded interactions. The analysis of the amino acid preference within and around PPII-helices showed the avoidance of aromatic residues within the helix, while preference of Gly, Asn and Asp residues in the flanking region. Detailed analyses of various functional and structural roles mediated by PPII-helices have also been carried out. Identification and analysis of non-bonded interactions within a molecule and with the surrounding molecules are an essential part of structural studies. Given the importance of these interactions, we have developed a new algorithm named MolBridge and Chapter 7 provides the detailed description about it. MolBridge is an easy to use algorithm based purely on geometric criteria that can identify all possible non-bonded interactions, such as hydrogen bond, halogen bond, cation…π, π…π and van der Waals, in small molecules as well as biomolecules. Various features available in the webserver make it more user-friendly and interactive. The Unix/Linux version of the program is freely downloadable and the web server version is available at http://nucleix.mbu.iisc.ernet.in/molbridge/index.php. The overall conclusion from the current investigation and the possible future directions are presented in Chapter 8. Our findings suggest that the path traversed by Cα atoms is enough for the identification of SSEs. We believe that the various algorithms (ASSP, HELANAL-Plus and MolBridge) developed can provide a better understanding of the finer nuances of protein secondary structures. ASSP can make an important contribution in the better understanding of comparatively less frequent structural motifs and identification of novel SSEs. The most comprehensive study of π-helices gives in-depth insight about it. The analysis of interspersed π-helices gives a comprehensive understanding of the local deformations and variations in the helical segments. Apart from studies embodied in the thesis, author has been involved in few other studies, which are provided as appendix: Appendix A describes a program RNAHelix, which can regenerate duplexes from the dinucleotide step and base pair parameters for a given double helical DNA or RNA sequence. It can be used to generate/ regenerate the duplexes with the non-canonical base pairing as well.