Interdisciplinary mathematical sciences (IMS)

Interdisciplinary mathematical sciences (IMS) https://etd.iisc.ac.in/handle/2005/29 2026-04-08T12:01:45Z Analysis, Modelling and Prediction of Protein-Protein Interfaces and Application to Biological Problems https://etd.iisc.ac.in/handle/2005/6798 Analysis, Modelling and Prediction of Protein-Protein Interfaces and Application to Biological Problems Parvathy, J Proteins play crucial roles in many biological processes like signalling, catalysis of metabolic processes, immune systems, and transporting molecules. To perform this wide range of functions, proteins interact with other biomolecules. Therefore, the characterization of a protein-protein interface is vital in understanding their binding affinity, function, and so forth and is of utmost importance in their experimental mutagenesis studies, predicting protein-protein networks, designing drug targets, and engineering proteins. There are several experimental methods, like X-ray crystallography, Nuclear Magnetic Resonance (NMR), and so forth, for identifying the interface residues in a complex. However, these experimental methods are time-consuming, labour-intensive, and have associated challenges like the protein not being amenable to experimental conditions for structure determination, difficulty in getting high-quality crystals, and purification and expression of protein samples prone to aggregation. Hence, various computational approaches for the same have become crucial in complementing these experimental approaches. The interface residues of a protein-protein complex are assumed to have the following two properties: (a) they always interact with a residue of a partner protein, which forms the basis for distance-based interface residue identification methods, and (b) they are solvent-exposed in the isolated form of the protein and become buried in the complex form, which forms the basis for Accessible Surface Area (ASA)-based methods. The first study interrogates this popular assumption by recognizing interface residues in protein-protein complexes through these two methods. The results show that a few residues are identified uniquely by each method, and the extent of conservation, propensities, and their contribution to the stability of protein-protein interaction varies substantially between these residues. The case study analyses showed that interface residues, unique to distance, participate in crucial interactions that hold the proteins together, whereas the interface residues unique to the ASA method have a potential role in the recognition, dynamics, and specificity of the complex and can also be a hotspot. Overall, our study recommends applying both distance and ASA methods so that some interface residues missed by either method but are crucial to the stability, recognition, dynamics, and function of protein–protein complexes are identified in a complementary manner. Hotspots are interfacial residues in protein-protein complexes that contribute significantly to complex stability. In the second project, we introduce the concept of secondary shell hotspots, which are hotspots uniquely identified by the distance-based approach, staying buried in both the bound and isolated forms of the protein and yet forming direct interactions with the partner protein. From the analysis of the dataset curated from docking benchmark dataset v5.5, we find that secondary shell hotspots are more evolutionarily conserved and have higher Chou-Fasman propensities for hydrophobic and long chain residues and have distinct interaction patterns compared to other hotspots. From detailed case study analyses, we observe that the interaction network formed by the secondary shell hotspots is crucial for complex stability and activity, and they are potentially allosteric propagators that bridge interfacial and non-interfacial sites in the protein. Their mutations to any other amino acid types cause significant destabilization. Overall, this study sheds light on the uniqueness and importance of secondary shell hotspots in protein-protein complexes. Impaired PPIs can cause many diseases, such as neurological disorders and cancer. Moreover, the conserved structure of hotspots and their tremendous impact on the binding energy have made them attractive medical targets for designing inhibitor drugs. Such inhibitors can avoid unwanted protein-protein interactions and more effectively treat various diseases. Such drugs are designed by targeting hotspots with virtual ligand screening and template-directed combinatorial chemistry. In addition, prior knowledge of hotspot residues has been extensively employed in protein−protein docking. Hence, this study on secondary shell hotspots would help design drugs and provide better restraints for docking analyses. The learnings from the first two projects have been applied to different collaborative projects. We developed a machine learning-based algorithm to predict the interacting pair of residues given the unbound structures of the constituent proteins in the complex (with Mr. Adithyan Unni). We used the docking benchmark dataset v5.5 to train the model using the CatBoost algorithm that applies gradient-boosting on decision trees. Our model exhibits performance comparable with other state-of-the-art methods. In another study, the experimental studies conducted by Prof. Ravi Sundaresan and the team showed that the NAD+-dependent protein deacetylase, SIRT6, has a crucial role in negatively regulating fatty acid uptake in cardiomyocytes. This is achieved by transcriptionally regulating the fatty acid transporters through SIRT6’s binding with the transcription factor PPARγ. Hence, we derived an in silico docked model of the SIRT6/PPARγ complex and suggested the most likely binding pose. Overall, the study could aid in exploiting SIRT6 as a potential therapeutic target for protecting the heart from metabolic diseases. In another collaborative project on toxin-antitoxins systems in Mycobacterium tuberculosis (Mtb), the growth inhibition and rescue experiments by Prof. Ramandeep Singh and the team showed the possibility of cross-talk between specific RelE toxins and specific VapB antitoxins. Through sequence and structure-based in silico analyses, we could identify certain residues that could potentially be important at the non-cognate interface. Further, using in silico docking, we derived models for the non-cognate complexes and suggested a few mutations that could possibly disrupt this cross-talk. Overall, we hope the studies conducted as part of this thesis help better understand the interface of protein-protein complexes and further aid in designing experimental mutagenesis with a wide range of applications in drug design, predicting protein signalling, and 3D modelling of large macromolecular complexes. Compiler controlled Task Management in Runtime Systems for Dynamic Data ow Model of Execution https://etd.iisc.ac.in/handle/2005/5593 Compiler controlled Task Management in Runtime Systems for Dynamic Data ow Model of Execution Madhu, Kavitha T For the past 40 years, relentless focus on Moore's Law transistor scaling has provided ever-increasing transistor performance and density. An ever increasing demand for large scale parallelism has driven hardware designers to fit in more cores per die, reaching physical limits of power dissipation. Attention has now turned to low power light-weight cores such as ARM with thousands of wimpy and brawny cores per die. The responsibility of running applications on such cores, however, still remains with the operating system, adding to the overheads of an otherwise light-weight shared-memory based application. A massively parallel low power chip with high scalability as a building block for a larger compute infrastructure is the preferred design. Such chips are expected to sport features such as inexpensive computation and communication along with a low-latency runtime interface. State of the art runtime systems incur significant performance penalties as they are tied to traditional parallel computing models. Performance concerns have led researchers to consider alternative models of computing such as Dynamic Dataflow. Such models have proven to be more scalable and power budget friendly, making parallelism exploitation more amenable even with irregular applications that usually are tricky to parallelize and scale. An ideal runtime implementation exposes runtime management primitives to the software abstraction layer to use. We introduce one such distributed hardware runtime for a massively parallel manycore processor (called REDEFINE), that exposes parallelism handles as instructions that are part of the its ISA.We present a compilation strategy that utilizes the primitives to effectively manage tasks on the hardware. REDEFINE's compiler controls task creation and deletion and manages communication between them, and balances task loads on REDEFINE's distributed execution fabric. A Digital Health framework for Personalized medicine: Development of a new algorithm for identifying concise actionable driver gene panels and application in Cancer and Rheumatoid Arthritis https://etd.iisc.ac.in/handle/2005/6921 A Digital Health framework for Personalized medicine: Development of a new algorithm for identifying concise actionable driver gene panels and application in Cancer and Rheumatoid Arthritis Sriraman, Shrisruti Precision medicine, enabled by next-generation sequencing (NGS), has shown tremendous potential for use in a clinical setting for disease diagnosis and treatment. The biggest promise is to make treatments more precise and tailored for individual patients, departing from the one-size-fits-all approach. However, the wider translation of genomic panels into clinical practice and its implementation into a digital health framework is met with challenges. Conventional gene panels based on frequently occurring mutations benefit only a subset of patients. Hence, there is an urgent need to expand the scope of this to all patients, for which new methods are required to be developed so as to identify key actionable gene panels in all patients. We address this gap in this work and present a new algorithm, PreDDs (Precision Driver Panels), to identify gene mutations that drive the disease by integrating genomics, transcriptomics, genome-wide protein-protein interactions and precision networks. Our unbiased network method combines both the gene alterations and the perturbed gene expressions in the functional context to give a comprehensive molecular-level view of the pathological drivers in individual patients, which we refer to as ipanels. Our algorithm shows superior performance when compared to the existing methods. It gives patient-wise concise gene panels that encapsulate major molecular perturbations in the disease. We observe that PrOPs is able to capture many gold-standard genes that represent the altered pathways in the diseases studied. Further, we add a computational workflow to identify ‘actionable’ genes from the panels and associate them with known ‘actions’ in terms of the available drugs to modify the effect of the alterations in the panel genes. This end-to-end pipeline constitutes a framework in digital health and enables its application in a clinical setting, where the pipeline takes in exome and bulk transcriptome sequences as inputs and produces a personalized report indicating the key driver genes in that patient as well as druggable genes with suggested action that guides the clinician in the decision-making process. We developed and tested the algorithm on individual patient data from 6 different cancer cohorts from The Cancer Genome Atlas (TCGA) - Breast Adenocarcinoma, Colon Adenocarcinoma, Glioblastoma, Hepatocellular Carcinoma, Lung Adenocarcinoma and Skin Melanoma. This resulted in gene panels that were unique to the individuals in the cohort, with the panel sizes ranging from 1 to 9. Apart from identifying most of the known driver genes like TP53, APC, EGFR, RAS and RAF, we also identified a number of rare driver genes that would have been otherwise missed at the population level. This includes genes like ID2 in melanoma that was mutated in only one patient sample but was correctly identified by PrOPs to be a driver gene in that patient. This also happens to be a COSMIC signature gene in Melanoma. We extend the functional significance of the panel genes to the phenotype by formulating a risk score based on the network connectivity of the panel genes and the perturbed genes and the respective hazard ratio calculated from the studied cohort. This score gives insights into the survival status of the individual. Further, 92% of the patients studied had at least one actionable gene in their panel. Next, we expand the framework to generalize it to other diseases outside the oncology domain and select an autoimmune disease - Rheumatoid Arthritis (RA) as an example. We refer to the expanded framework as PreDD (Precision Disease Drivers). We generated a new South Indian cohort involving patients with Rheumatoid arthritis, in collaboration with a hospital in this region. The major risk factor for this autoimmune disease is the variations in the genes. PreDD precisely identified genes that are relevant to the disease. We observe that PreDD genes are also gold standards studied in this disease, indicating the potential of our algorithm to be used as a general driver gene identification tool. In the final part of the thesis, we present an ordinary differential equation (ODE) model of the signalling pathways based on the panels identified above in both the Cancer and Rheumatoid arthritis cohorts. The results from the model simulations highlight the importance of the identified mutations and the effect they have on the relevant pathways and processes. Identifying these driver mutation profiles that influence the disease progression of each patient is necessary to understand their disease risk and to develop personalised treatment regimes. Individuals exhibit very high heterogeneity in their mutational profiles, making it essential to address the driver mutations in each patient. The insights gained from this thesis have a high potential to be applied to real-world data and translated to clinical practice to provide a platform for personalized care. A divide-and-conquer distance geometry method to model protein structures from NMR spectroscopy https://etd.iisc.ac.in/handle/2005/6674 A divide-and-conquer distance geometry method to model protein structures from NMR spectroscopy Das, Niladri Ranjan Nuclear Magnetic Resonance (NMR) spectroscopy provides insights into the dynamic behavior of proteins in solution, eliminating the need for crystallization. NMR experiments can spatially probe nuclei within 6 Angstroms of each other. The NMR signals are used to obtain geometric constraints consisting of distances and dihedral angles between atoms. Combined with the covalent-bond geometry, these can be used to determine the three-dimensional structure of a protein, which is essential for understanding its chemical and physiological functions. The technical challenge comes from the imprecise and sparse nature of the data and the potential number of configurations that scale exponentially with the protein size. This complexity renders the structure determination problem computationally intractable. In the absence of a direct approach, various heuristics are used. Protein structures are typically determined iteratively to account for errors from approximations; the errors are eliminated by cycling between structure computation and other stages in the NMR pipeline. This necessitates fast and robust structure computing algorithms. State-of-the-art techniques use molecular dynamics along with simulated annealing for structure calculation. However, they have a high computational overhead as multiple configurations are employed to avoid getting trapped in local minima of the potential energy landscape. We propose an alternative approach that emphasizes complying with the available experimental information, independent of external factors such as initial conformations and force-field parameters. Our protocol, dubbed Distance Restraints and Energy Assisted Modeling (DREAM), works primarily with the available distance and angle bounds. Although distance-geometry techniques were originally introduced for structure determination from NMR, they were not widely adopted due to factors such as computational overhead, lack of scalability, and intolerance to missing or ambiguous data. On the other hand, molecular dynamics-based methods can introduce force-field artifacts into the final results. We use the following innovations to address these drawbacks: Instead of depending on random starting structures (as in molecular dynamics), DREAM leverages the natural distribution of experimental constraints into regions of larger (cores) and sparse data coverage (gaps). A divide-and-conquer framework is designed to model the cores and gaps separately, facilitating the parallel computation of the substructures. We use nonlinear optimization to compute structures for the cores in parallel, align core substructures in a single step, avoid errors from pairwise alignments, and model structure for the gaps. This is particularly effective for proteins with sparse coverage of experimental data. The distance-geometry approach removes the reliance on external factors such as force-field parameters to arrive at native structures before the post-processing stages. DREAM was tested to be robust to erroneous and missing data and can scale to large proteins with 52–271 amino acid residues. The bottom-up strategy of DREAM is closer to the protein folding paradigm, where smaller substructures are modeled first before being assimilated into the global structure. Such an approach was successfully shown to model folds and tertiary structures, whereas other contemporary distance geometry-based methods failed to yield protein conformation. DREAM was successfully tested across more than 100 protein folds. Notably, DREAM is accessible as an offline package or through a web portal, accepting input files in a widely used format. This compatibility enables potential future integration of DREAM into the NMR framework for automated or semi-supervised protein structure derivation. Comparing the protein structures from DREAM with publicly available conformations reveals notable differences, particularly in fluctuations within mobile regions. These variations offer valuable insights into protein dynamics and facilitate the investigation of protein functionality.