A computational systems biology approach for elucidating molecular features of primary and metastatic melanoma
Abstract
Malignant melanoma, a cancer arising from melanocytes, is reported to have one of
the fastest growing incidence rates worldwide, and is considered to be one of the most
aggressive human malignancies. According to the World Health Organization (WHO),
current statistics indicate that 132,000 cases occur globally each year and is set to rise
by 2-3% every year.
If detected early, a complete surgical excision of the tumor can be performed. However,
in many cases, diagnosis and treatment is delayed, leading to poor prognosis, with a
survival expectation of a mere 6-9 months in the case of metastatic melanoma. Due to
high incidence rates, difficulty in early diagnosis and rapid progression to metastasis,
melanoma is an important malignancy to be studied.
Diagnosis and stage classification of many diseases including cancers is still a major
challenge. Melanoma is a highly heterogeneous disease, with multiple sources of
heterogeneity. It can arise in many regions of the body, with varied incidence patterns.
It is also linked to various alterations at the molecular level, with the pattern of
alterations varying widely across patients. As a result of this heterogeneity, existing
diagnostic and therapeutic methods can lead to poor outcomes in a subset of patients
even if it is effective in a different subset. A large amount of genomic and
transcriptomic data of tumor samples from patients is now available, which provides
new opportunities for understanding disease mechanisms and identifying specific
molecular features characteristic of the disease stage and sub-type. In this context, a
feature refers to a gene or a pool of genes or even a pathway. Identification of molecular
features from such large complex data is still a major challenge in many diseases, and
is currently a highly pursued objective. The multi-level complexities involved in the
disease and the need to study large patient data to understand the perturbations at a
systems level, necessitates the use of large scale computational approaches.
While melanoma, like many other diseases, can be associated with variations at
multiple levels, the onset of the disease is due to mutations in critical genes. Metabolic
alterations are also known to occur in the disease, which cater to the high energy needs
of a progressive tumor. Viewing the disease from a different perspective, modulation
of the immune processes has also been studied, which has now shown that the tumor
evades the immune system and manages to proliferate. In this work, a computational
systems biology approach was used to identify the molecular features which addresses
all these aspects and are responsible for the progression of melanoma.
As a first objective in Chapter 2, a new network-based computational pipeline
combined with machine learning method which utilizes publicly available
transcriptomic data of melanoma patient samples was developed to identify signature
genes which can efficiently classify metastatic melanoma and primary melanoma.
These genes can be potential biomarkers for the identification of progression in
melanoma patients. To begin with, a condition-specific protein-protein interaction
network was constructed for the three conditions, normal skin (NS), primary melanoma
(PM) and metastatic melanoma (MM). Further, the active paths in each of the
networks were computed based on the shortest-path approach. The paths different in
MM compared to NS, PM compared to NS and MM compared to PM were identified
using a string similarity metric. These perturbed paths were further pruned based on
the influence they wield on the entire system. To do this, network communities were
identified and genes in them scored based on the number of communities they spanned.
Using this, the most influential, differentially expressed genes in all the three
comparisons were identified and were taken as a short-list of markers. The shortlisted
genes were further evaluated by a machine learning approach and ranked by their
discriminatory capacities. Based on a feature elimination exercise, a minimal gene-set
with the maximum efficiency for classification between the pair of conditions, was
identified using a Random Forest classifier. From this, a panel of 6 genes, ALDH1A1,
HSP90AB1, KIT, KRT16, SPRR3 and TMEM45B whose expression values
discriminated metastatic from primary melanoma by 87% classification accuracy was
identified. In an independent transcriptomic data set derived from 703 primary
melanomas from a collaborator’s laboratory, it was observed that all six genes in the
panel were significant in predicting melanoma specific survival (MSS) in a univariate
analysis, which was also consistent with AJCC staging. Further, 3 of these genes,
HSP90AB1, SPRR3 and KRT16 remained significant predictors of MSS in a joint
analysis of the 6 genes (HR=2.3, P=0.03), although HSP90AB1 (HR=1.9, P=2x10-4)
alone remained predictive after adjusting for clinical predictors. A panel of 20 genes
with high discriminatory capacity to classify PM vs. NS and a panel of 25 genes for
MM vs. NS were also identified.
Chapter 3 describes the work carried out to identify potential driver mutations in
melanoma. Melanoma is a malignancy with a high mutation burden, with the onset of
the disease attributed to mutations caused by external stresses. The mutations are
observed to modulate multiple pathways, with the landscape of mutations varying
across patients. In some cases, different mutations resulting in the same end effect can
also be seen. These observations highlight the extent of heterogeneity among melanoma
patients. A novel algorithm, DMIN (Driver Mutation identification using Influence
Network) was designed to identify patient-specific potential driver mutations using the
mutation information and gene expression variation of 362 melanoma patients from
the TCGA dataset which was integrated with a comprehensive protein-protein
interaction network. The active paths based on the shortest-paths principles were
computed from the mutated node as a source, to all other nodes as possible destinations
in the 362 patient-specific networks. The paths were further scored and prioritized in
each patient to identify the mutations and differentially expressed nodes, referred to
as outliers. A tripartite graph was constructed consisting of patient (P), mutations (M)
and outlier (O) as three connected node sets. The M nodes were ranked based on the
betweenness centrality and based on percentile threshold, 59 potential driver mutations
were identified which were found to be statistically significant. The performance of the
DMIN method was further validated by comparing with three other existing methods
and DMIN method was found to outperform others. Co-occurring mutation
combinations were also computed and shortlisted based on their effect on the survival
of the patients. In total, 68 combinations ranging from 2-12 genes with a high hazard
to survival were identified. Finally, driver mutations were computed in the patients
based on their clinicopathological information such as the sample type, mutation
subtype, AJCC stage, Breslow thickness, gender and ulceration and pathways enriched
in each of these conditions are described.
Metabolic rewiring is an important characteristic of the tumor cells. The pathway
rewiring accounts for the increased energy requirements and also aid in the proliferation
of the cells. In Chapter 4, Flux balance based analysis was carried out using a genome
scale metabolic model to identify the variations associated with disease progression of
cancer. To begin with, a melanoma metabolic model was constructed using a general
human metabolic model and gene expression data of NS, PM and MM samples. The
flux level variations were computed between PM-NS, MM-NS and MM-PM conditions
and sub-systems that varied were identified. The reactions belonging to ROS
detoxification, Warburg effect and tyrosine metabolism were found to be largely varied
in the melanoma condition. In addition, Vitamin A and Vitamin C metabolism
variation were observed between MM and PM.
Gene essentiality analysis on the metabolic model identified 5 important genes needed
for the cancer proliferation and can be validated for being important as therapeutic
targets.
In chapter 5, the molecular features of the immune system involved in progression of
melanoma were investigated. FOXP3+ regulatory T cells are the immune cell types
involved in maintaining an immune check by suppressing the immune activation
function of effector T cells. High Treg leads to a bad prognosis of disease, whereas a
high Teff population is linked to good prognosis. High FOXP3 expression levels correlate
well with a low survival of melanoma patients. Consistent with this, a low ratio of Treg:
Teff cells in the tumor microenvironment is attributed to the success of IL-2 based
immunotherapy. A network based analysis was carried out using transcriptome
expression values of FOXP3_high and FOXP3_low primary melanoma patients.
Active paths in FOXP3_high patients were identified and genes reported. PTEN and
FOS were predicted to modulate the expression of FOXP3 leading to an
immunosuppressed environment. A simple deterministic model was also constructed to
mimic the population interplay between Treg and Teff cells in the tumor
microenvironment, which provides a basis to predict the disease outcome and prognosis
of survival in a given patient.
In summary, this thesis presents an integrated approach for identification of molecular
markers, metabolic variations and driver mutations of melanoma. The outcome of the
work holds promise in efficient classification of the various stages involved and also aid
in predicting prognosis of the melanoma disease. The methods developed for
identification of biomarkers and driver mutations are fairly general and can easily be
adapted for studying other diseases as well.
Collections
- Mathematics (MA) [162]