Multi-scale Modelling of HLA Diversity and Its Effect on Cytotoxic Immune Responses in Influenza H1N1 Infection
Abstract
Cytotoxic T-lymphocytes (CTLs) are important components of the adaptive immune system and function by scanning the intracellular environment so as to detect and de-stroy infected cells. CTL responses play a major role in controlling virus-infected cells such as in HIV or influenza and cells infected with intracellular bacteria such as in tuberculosis. To do so they require the antigens to be presented to them, which is fulfilled by the major histocompatibility complex (MHC), commonly known as human leukocyte antigen or HLA molecules in humans. Recognition of antigenic peptides to Class-1 HLA molecules is a prerequisite for triggering CTL immune responses. Individuals differ significantly in their ability to respond to an infection. Among the factors that govern the outcome of an infection, HLA polymorphism in the host is one of the most important. Despite a large body of work on HLA molecules, much remains to be understood about the relationship between HLA diversity and disease susceptibility. High complexity arises due to HLA allele polymorphism, extensive antigen cross-presentability, and host-pathogen heterogeneity. A given allele can recognize a number of different peptides from various pathogens and a given peptide can also bind to a number of different individuals. Thus, given the plurality in peptide-allele pairs and the large number of alleles, understanding the differences in recognition profiles and the implications that follow for disease susceptibilities require mathematical modelling and computational analysis.
The main objectives of the thesis were to understand heterogeneity in antigen presentation by HLA molecules at different scales and how that heterogeneity translates to variations in disease susceptibilities and finally the disease dynamics in different populations. Towards this goal, first the variations in HLA alleles need to be characterized systematically and their recognition properties understood. A structure-based classification of all known HLA class-1 alleles was therefore attempted. In the process, it was also of interest to see if understanding of sub-structures at the binding grooves of HLA molecules could help in high confidence prediction of epitopes for different alleles. Next, the goal was to understand how HLA heterogeneity affect disease susceptibilities and disease spread in populations. This was studied at two different levels. Firstly, modelling the HLA genotypes and CTL responses in different populations and assessing how they recognized epitopes from a given virus. The second approach involved modelling the disease dynamics given the predicted susceptibilities in different populations. Influenza H1N1 infection was used as a case study. The specific objectives addressed are: (a) To develop a classification scheme for all known HLA class-1 alleles that can explain epitope recognition profiles and further to dissect the physic-chemical features responsible for differences in peptide specificities, (b) A statistical model has been derived from a large dataset of HLA-peptide complexes. The derived model was used to identify the interdependencies of residues at different peptide and thereby, rationalize the HLA class-I allele binding specificity at a greater detail, (c) To understand the effect of HLA heterogeneity on CTL mediated disease response. A model of HLA genotypes for different populations was required for this, which was constructed and used for estimating disease response to H1N1 via the prediction of epi-topes and (d) To model disease dynamics in different populations with the knowledge of the CTL response-grouping and to evaluate the effect of heterogeneity on different vaccination strategies. Each of the four objectives listed above are described subsequently in chapters 2 to 5, followed by Chapter 6 which summarises the findings from the thesis and presents future directions. Chapter 1 presents an introduction to the importance of the function of HLA molecules, describes structural bioinformatics as a discipline and the methods that are available for it. The chapter also describes different mathematical modelling strategies available to study host immune responses.
Chapter 2 describes a novel method for structure-based hierarchical classification of HLA alleles. Presently, more than 2000 HLA class-I alleles are reported, and they vary only across peptide-binding grooves. The polymorphism they exhibit, enables them to bind to a wide range of peptide antigens from diverse sources. HLA molecules and peptides present a complex molecular recognition pattern due to multiplicity in their associations. Thus, a powerful grouping scheme that not only provides an insightful classification, but is also capable of dissecting the physicochemical basis of recognition specificity is necessary to address this complexity. The study reports a hierarchical classification of 2010 class-I alleles by using a systematic divisive clustering method.
All-pair distances of alleles were obtained by comparing binding pockets in the structural models. By varying the similarity thresholds, a multilevel classification with 7 supergroups was derived, each further categorized to yield a total of 72 groups. An independent clustering scheme based only on the similarities in their epitope pools correlated highly with pocket-based clustering. Physicochemical feature combinations that best explains the basis for the observed clustering are identified. Mutual information calculated for the set of peptide ligands enables identification of binding site residues that contribute to peptide specificity. The grouping of HLA molecules achieved here will be useful for rational vaccine design, understanding disease susceptibilities and predicting risk of organ transplants. The results are presented in an interactive web- server http://proline.iisc.ernet.in/hlaclassify.
In Chapter 3, the knowledge of structural features responsible for generating peptide recognition specificities are first analysed and then utilized for predicting T-cell epi-topes for any class-1 HLA allele. Since identification of epitopes is critical and central to many of the questions in immunology, a study of several HLA-peptide complexes is carried out at the structural level and factors are identified that discriminate good binder peptides from those that do not. T-cell epitopes serve as molecular keys to initiate adaptive immune responses. Identification of T-cell epitopes is also a key step in rational vaccine design. Most available methods are driven by informatics, critically dependent on experimentally obtained training data. Analysis of the training set from IEDB for several alleles indicate that sampling of the peptide space is extremely sparse covering only a tiny fraction of all possible nonamer space, and also heavily skewed, thus restricting the range of epitope prediction. A new epitope prediction method is therefore developed. The method has four distinct modules, (a) structural modelling, estimating statistical pair-potentials and constraint derivation, (b) implicit modelling and interaction profiling, (c) binding affinity prediction through feature representation and (d) use of graphical models to extract peptide sequence signatures to predict epitopes for HLA class I alleles . HLaffy is a novel and efficient epitope prediction method that predicts epitopes for any HLA Class-1 allele, by estimating binding strengths of peptide-HLA complexes which is achieved through learning pair-potentials important for peptide binding. It stands on the strength of mechanistic understanding of HLA-peptide recognition and provides an estimate of the total ligand space for each allele. The method is made accessible through a webserver http://proline.biochem.iisc.ernet.in/HLaffy.
In chapter 4, the effect of genetic heterogeneity on disease susceptibilities are investigated. Individuals differ significantly in their ability to respond to an infection. Among the factors that govern the outcome of an infection, HLA polymorphism in the host is one of the most important. Despite a large body of work on HLA molecules, much remains to be understood about how host HLA diversity affects disease susceptibilities. High complexity due to polymorphism, extensive cross-presentability among HLA alleles, host and pathogen heterogeneity, demands for an investigation through computational approaches. Host heterogeneity in a population is modelled through a molecular systems approach starting with mining ‘big data’ from literature. The in-sights derived through this is used to investigate the effect of heterogeneity in a population in terms of the impact it makes on recognizing a pathogen. A case study of influenza virus H1N1 infection is presented. For this, a comprehensive CTL immunome is defined by taking a consensus prediction by three distinct methods. Next, HLA genotypes are constructed for different populations using a probabilistic method. Epidemic incidences in general are observed to correlate with poor CTL response in populations. From this study, it is seen that large populations can be classified into a small number of groups called response-types, specific to a given viral strain. Individuals of a response type are expected to exhibit similar CTL responses. Extent of CTL responses varies significantly across different populations and increases with increase in genetic heterogeneity. Overall, the study presents a conceptual advance towards understanding how genetic heterogeneity influences disease susceptibility in individuals and in populations. Lists of top-ranking epitopes and proteins are also derived, ranked on the basis of conservation, antigenic cross-reactivity and population coverage, which pro- vide ready short-lists for rational vaccine design (flutope).
Next, in Chapter 5, the effect of genetic heterogeneity on disease dynamics has been investigated. A mathematical framework has been developed to incorporate the heterogeneity information in the form of response-types described in the previous chap-ter. The spread of a disease in a population is a complex process, controlled by various factors, ranging from molecular level recognition events to socio-economic causes. The ‘response-typing’ described in the previous chapter allows identification of distinct groups of individuals, each with a different extent of susceptibility to a given strain of the virus. 3 different approaches are used for modelling: (i) an SIR model where different response types are considered as partitions of each S, I and R compartment. Initially SIR models are developed, such that the S compartment is sub-divided into further groups based on the ‘response-types’ obtained in the previous chapter. This analysis shows an effect in infection sweep time, i.e., how long the infection stays in the population. A stochastic model incorporates the environmental noise due to random variation in population influx, due to birth, death or migration. The system is observed to show higher stability in the presence of genetic heterogeneity. As the contagion spreads only through direct host to host contact. The topology of the contact network, plays major role in deciding the extent of disease dynamics. An agent based computational framework has been developed for modelling disease spread by considering spatial distribution of the agents, their movement patterns and resulting contact probabilities. The agent-based model (ABM) incorporates the temporal patterns of contacts. The ABM is based on a city block model and captures movement of individuals parametrically. A new concept of system ‘characteristic time’ has been introduced in context of a time-evolving network. ‘Characteristic time’ is the minimum time required to ensure, every individual is connected to all other individuals, in the time aggregated contact network. For any given temporal system, disease time must exceed ‘characteristic time’ in order to spread throughout the population. Shorter ‘characteristic time’ of the system is suggestive of faster spread of the disease. A disease spread network is constructed which shows how the disease spreads from one infected individual to others in the city, given the contact rules and their relative susceptibilities to that viral strain. A high degree of population heterogeneity is seen to results in longer disease residence time. Susceptible individuals preferentially get infected first thereby exposing more susceptible individuals to the disease. Vaccination strategies are derived from the model, which indicates that vaccinating only 20% of the agents, who are hub nodes or highly central nodes and who also have a high degree to susceptible agents, lead to high levels of herd immunity and can confer protection to the rest of the population.
Overall, the thesis has provided biologically meaningful classification of all known HLA class-1 alleles and has unravelled the physico-chemical basis for their peptide recognition specificities. The thesis also presents a new algorithm for estimating pep-tide binding affinities and consequently predicting epitopes for all alleles. Finally the thesis presents a conceptual advance in relating HLA diversity to disease susceptibilities and explains how different populations can respond differently to a given infection. A case study with the influenza H1N1 virus identified populations who are most susceptible and those who are least susceptible, in the process identifying important epitopes and responder alleles, providing important pointers for vaccine design. The influence of heterogeneity and response-typing on disease dynamics is also presented for influenza H1N1 infection, which has led to the rational identification of effective vaccination strategies. The methods and concepts developed here are fairly generic and can be adapted easily for studying other infectious diseases as well.
Three new web-resources, a) HLAclassify, b) HLaffy and c) Flutope have been developed, which host pre-computed results as well as allow interactive querying to an user to perform analysis with a specific allele, peptide or a pathogenic genome sequence.
Collections
- Mathematics (MA) [159]