New coarse grained molecular mechanics for proteins dynamics and Its application on dynamics-function correlation
Abstract
Dynamics information of proteins can be used for many purposes, ranging from evolutionary studies to biophysical analysis and drug-design processes, etc. Long timescale and compact dynamics information are obtained from coarse-grained (CG) molecular dynamics (MD) simulations. Except for the MARTINI force field, most of the recently developed coarse-grained force fields are in-house products, and none of them are benchmarked against experimental data. In this thesis, a knowledge-based coarse-grained force field (CGMM) is developed with the aim to reproduce experimentally matching dynamics of proteins and apply it to different research problems. This CG force field is based on C? atoms. Proposed potentials are based on statistical distributions of properties like bonded and nonbonded distance, virtual angle, and dihedral. Statistical distributions are calculated from known unbiased protein databases. CGMM can produce simulation results that match closely (overall correlation coefficient: 0.74) with the root-mean-square fluctuation (RMSF) graphs calculated from the 315 unique NMR structures with 10 or larger ensemble models. The proposed force field is useful for performing long timescale protein dynamics of monomeric single-domain proteins. This CGMM is adequate for correlating structural and dynamics features of proteins with their biomolecular function (Chapter 2).
In the post-genome era, thousands of new proteins are being discovered every day, for which no high-throughput experimental technique exists to assay protein molecular function. Computational methods are constantly attempting to fill this gap but face major bottlenecks due to limitations in the use of evolutionary information. This thesis presents a first proof-of-principle for de novo inference of protein molecular function from structural dynamics, suggesting that dynamics alone can be a general basis to infer molecular function regardless of the overall sequence identity of the matching proteins. Our work exploits the premise that function(s) of proteins are performed by structural motions saddling chemical features for biochemical activity. A custom-built coarse-grained force field (CGMM) for molecular dynamics simulation is used for this work (Chapter 3).
In this thesis, some of the issues faced in an attempt to implement CGMM within the GROMACS package to accelerate simulations in CPU-GPU mode are presented. Despite the fact that we were unsuccessful, attempts were made to accelerate the simulation time using a hybrid CPU-GPU system; a lot of learning occurred about the internals of the GROMACS package, which is shared in this thesis. The main underlying problem was that the tabulated potential form is not supported in GROMACS in GPU mode. Deeper knowledge of GROMACS source code and GPU programming is required for success in this issue. Proper documentation of GROMACS code is not available, and the source code is also not well documented to help in this matter (Chapter 4).
The first release of the CGMM force field did not consider metal ions; hence, for metalloprotein cases, dynamic behaviors differ from experimental results. Therefore, five important and most abundant metal ions (Zn²?, Fe²?, Ca²?, Cu²?, Mg²?) were incorporated into the CGMM force field. A knowledge-based protein-metal ion (single) interaction potential is developed for this purpose, essentially as an extension of the first release of the CGMM force field by additionally incorporating single metal ions (Metal-CGMM). The proposed protein-metal potential is based on statistical distributions of properties such as the distance between the ion and C? atom of the binding residue, the angle between the metal and two C? atoms of the binding residue, and the torsion between C? atoms of three binding residues and the metal ion. Statistical distributions are calculated from a known unbiased protein database, similar to the first release of the CGMM force field. Results of Metal-CGMM and CGMM are compared; in the case of 62.7% of metalloproteins, Metal-CGMM outperforms CGMM. Overall, the mean correlation coefficient of the RMSF curve between simulation and experimental data is improved from 0.74 to 0.81 for 315 unique NMR structures having 10 or larger ensemble structures. This is a novel approach to developing metal-protein interactions from statistical information, as the interaction does not depend on charge (Chapter 5).
Dynamics information of 5,264 proteins has been archived; these are obtained from 1 ps CGMD simulations using our proposed CGMM force field. These are single subunit monomer proteins that have more than 30 residues, with sequence identity between each pair less than 30%. Among them, 447 proteins are metalloproteins with Zn, Fe, Ca, Cu, and Mg ions. The dynamics data from these proteins have been preprocessed and archived as backend data for the Dynfunc web server (http://pallab.serc.iisc.ac.in/dynfunc
). The Dynfunc web server implements the protein function inference method. Using this web server, the user can identify functionally important regions of a protein with unknown function. The server takes a protein’s structural ensemble as input and provides segment(s) of the input protein that are dynamically matched with the stored protein dynamics database as output. CGMM force field has been made public through a webpage (http://pallab.serc.iisc.ac.in/CGMM
), and anyone can freely download CGMM potential tables and programs for creating topology files for use with the GROMACS package (Chapter 6).
The thesis is summarized in the last chapter (Chapter 7) with the author’s concluding thoughts on some of the issues faced. Clearly, a computational work as described above is expected to provide insight into biological phenomena using physics-related tools and mathematics. It is hoped that the thesis achieves this objective and benefits the reader.
Collections
- Mathematics (MA) [262]

