dc.description.abstract | Biological systems are complex networks of molecular components that function in a tightly controlled manner. Any form of perturbation in these components or their associated interactions affects the normal physiology of the cell and often leads to disease pathogenesis and progression. The vast amount of clinical data providing details on various aspects of these components (Genome to Phenome) open up opportunities to capture disease-specific rewiring. Most importantly, systems biology provides a framework to leverage such high throughput data to infer context-specific perturbations to highlight new insights about disease and health.
The broad objective of the thesis is to use an unbiased, comprehensive systems-level modeling and analysis to address the key questions about complex diseases. Towards this, contextualized network models were constructed and analyzed that identified (a) blood-based biomarkers for precise differential diagnosis of viral and bacterial infections through modeling host response to acute infections and evaluated its relevance in disease recovery, (b) transcriptional regulatory factors that define different macrophage polarization states and plasticity, and evaluated their relevance in acute bacterial infection, (c) Common Disease Response Core across different complex diseases through a large-scale blood transcriptome analysis, and determined central molecular players that give rise to multiple perturbation patterns across diverse disease phenotypes. Besides, different network inference methods were compared and established that integrative knowledge-based network approaches show superior performance over statistical approaches in deriving biologically relevant insights
Inferring context-specific perturbations from high-throughput data is crucial to understand disease mechanisms. Attempts have been made to construct and infer such perturbations using network-based methods (statistical methods and knowledge-based methods). However, efforts to compare these two approaches have been few and far between. Moreover, there is no report so far that systematically evaluates the biological relevance of the genes or pathways that are inferred from the networks. Towards this, four different methods were compared, and the relevance of the inferred perturbations for the considered datasets were assessed. Two of the four methods considered in this study, WGCNA and ARACNe, belong to the broad class of data-driven approaches which do not rely on prior network information. On the other hand, ResponseNet and jActiveModules utilize knowledge-based protein-protein interaction networks and integrate condition-specific transcriptome or proteome data. The interactions inferred through all the approaches were evaluated and assessed for their biological relevance based on three criteria: (1) enrichment of the gold standard gene sets, (2) comparison to gold standard pathways, and (3) recovery of hub genes from the context-specific perturbed network, known to be related to the given condition. Overall, ResponseNet showed superior performance over the other methods in both tuberculosis and melanoma, based on all three criteria.
Infectious diseases form a significant portion of the health care burden and cause many hospitalizations and deaths globally. The COVID-19 pandemic that is currently ongoing is a grim testimony to the damage an infectious disease can cause to human health and welfare. Characterizing the etiology of infection will guide the clinician to the optimal treatment path and optimize the use of hospital resources. In this work, whole blood transcriptomes from six independent datasets (n=756) were used to computationally model host response to infections and discover a new 10-gene biomarker panel (Panel-VB) to discriminate bacterial from viral infections. Panel-VB was validated in eleven independent datasets (n=898) and demonstrated high predictive performance with a weighted mean AUROC of 0.97 (95% CI: 0.96 – 0.99). Based on the panel, a new standalone patient-wise diagnostic score VB10 was devised, which shows high diagnostic accuracy with a weighted mean AUROC of 0.94 (95% CI 0.91 – 0.98) in 2,996 patient samples from 56 public datasets from 19 different countries.
Further, VB10 was evaluated in a newly generated South Indian cohort and find 97\% accuracy in known cases of bacterial and viral infections. VB10 was capable of (a) accurately identifying the infection class in culture-negative indeterminate cases, (b) correlates with recovery from infection, (c) accurate diagnosis across different age groups, and covers a wide spectrum of acute bacterial and viral infections including uncharacterized pathogens. VB10 score tested on publicly available COVID-19 data showed viral infection in in vitro and patient samples. The results indicate the potential clinical utility of VB10 for precise diagnosis of acute infections and recovery monitoring, providing decision support for antibiotic prescriptions.
Macrophages polarize to different activation states in response to various endogenous and exogenous activation signals. Each state has a characteristic transcriptional, metabolic and cytokine profile, marking different points in the polarization spectrum and resulting in distinct cellular outcomes defining health and disease. The two extremes in the spectrum are denoted as classical (M1) and alternative (M2) polarization states and there is a large body of work that has probed the molecular causes as well as the consequences of such activation. However, the precise transcriptional profiles and the consequent molecular interaction trajectories that govern which precise polarization state is attained upon exposure to a given trigger remain largely unknown. In this work, through probing the transcriptional regulatory landscape in macrophages, key molecular factors that define different polarization states were identified. An integrative network approach where differential transcriptome profiles of macrophages in response to twenty-eight different activation signals were mapped onto a comprehensively curated genome-scale human transcriptional regulatory network and interrogated to map and identify epicentres of perturbations in response to different immunostimulants. Contextualized network models of macrophages exposed to these stimulants span a punctuated continuum of states with 12 clusters with M1 and M2 at the two ends. Barcodes consisting of epicentric transcription factors (TFs) for each polarization state were identified using a specifically configured computational pipeline consisting of a series of filters. A 3-gene TF set that controls the state switching behaviour between M1 to M2 state was identified. A siRNA Knockdown of the set (NBC: NFE2L2, BCL3, and CEBPB) switched M1 to an M2 phenotype, despite exposure to M1 stimulant. Switching M1 to M2 state via siRNA knockdown of NBC makes the host hyper-susceptible to Staphylococcus aureus infection, indicating the role of inflammatory M1-like macrophages in containing the bacterial load.
Whole blood transcriptomes of various diseases highlight substantial variation in the expression patterns of many genes. While some variations are well explained through independent molecular level studies, the effect of a number of them remain unexplained and hence have not received sufficient attention. In particular, very little is known about: a) the frequently perturbed genes across multiple diseases, b) common disease response core, and c) central molecular players in the core that give rise to multiple perturbation patterns that explain diverse disease phenotypes. In this work, publicly available whole blood transcriptomes of various diseases were analyzed to address these questions. Blood transcriptomes contain sufficient information to define and distinguish among different diseases. The inferred disease classification based on the differential blood transcriptome profile is consistent with (a) Disease Ontologies (DO), based on molecular data, and (b) International Classification of Disease Clinical Modification (ICD-10CM), based on clinical symptoms. Further, enrichment analysis highlighted that processes associated with the innate immune system were significantly perturbed across almost all diseases and formed the Common Disease Response Core (CDRC). However, the exact nature of perturbation in the diseased core varies from disease to disease. Genes forming the CDRC is sufficient to achieve the same disease clustering pattern as with the whole transcriptome. Further, identified a set of 25 highest influential genes associated with different perturbation patterns in different disease classes. Twenty-five genes, which are all transcription factors, are putative tunable factors (TF-25), whose specific expression levels define the specific tuning patterns observed in different disease classes. The tuned patterns in a case study of two disease classes belonging to lung pathology are highlighted here. The specific subset of TF-25 perturbs each disease class and has specific innate immune pathways that are affected downstream. The tuned patterns provide a comprehensive resource for studying differential pathogenesis mechanisms across a wide spectrum of diseases and developing new diagnostic and therapeutic strategies.
In conclusion, the work presented in the thesis provided comprehensive perspective of the host response to systemic diseases with the use of integrative network and data mining approaches for specific biomedical applications. | en_US |