dc.description.abstract | Understanding the molecular structure of DNA was considered as greatest achievement in modern biology. It helped in understanding fundamental cellular processes such as replication of DNA, nature of the genetic code and transcription. It also led to technological advancements such as DNA sequencing, genetic engineering and gene cloning. The DNA molecule is highly polymorphic in nature and its structure is dependent on environment, base composition and sequence context. B-DNA, A-DNA, Z-DNA and curved or kinked DNA are some of the well characterized double helical polymorphs. B-DNA is the most prevalent structure in vivo and it can undergo small local variations and global variations. In this thesis we refer to distinct structural property of any particular DNA sequence as deviation from fibre model B-DNA structural parameters or random sequence DNA. Structural properties of DNA are an outcome of the linear arrangement of the 4 chemically different nucleotide bases and the characteristic features of the two grooves (minor and major) arising due to the asymmetric position of glycosidic bonds of base pairs. DNA structure and properties are expected to vary along its length. Several structural features have been defined for DNA duplex, while DNA stability, bendability and intrinsic curvature are well studied and found to be biologically relevant. These three sequence dependent properties differ in their nature and information content and can be studied both at local and global levels, depending on the length of DNA fragment being examined. Majority of the work in this thesis focuses on the analysis of these three DNA structural features in promoter regions of different eukaryotic systems and their relationship with gene expression. The thesis work is divided in to five sections briefly described below. The sections discuss prevalence of the three structural features, DNA stability, bendability and intrinsic curvature in the promoter regions of six eukaryotic systems namely S. cerevisiae, D. melanogaster, C. elegans, zebrafish, mouse and human. The relationship between DNA structural features of promoter regions of S. cerevisiae with gene expression variability is discussed, followed by application of the structure-based promoter prediction algorithm ‘PromPredict’ in annotating promoter regions of six different eukaryotes. Finally, an analysis of structural features of the flanking sequences of transcription factor binding sites (TFBSs) of six transcription factors and their relationship with the DNA binding affinity is discussed. Each of the projects described below will appear as a separate chapters in the thesis.
An overview of the eukaryotic transcription machinery, promoter elements and different DNA structural properties are discussed in the introduction of the thesis (chapter 1).
The structural properties of DNA in the promoter regions of eukaryotic genes (chapter 2)Earlier studies in the lab reported that, apart from sequence motifs, promoter re- gions have distinct structural properties, such as lower stability, lesser bendability and more curvature compared to other genomic regions. But those studies were on small datasets and few model systems. Advancement in high-throughput tech- niques has made availability of transcription start site information for many model systems. This work was initiated with the aim of investigating the structural fea- tures in different eukaryotic systems belonging to different domains of life. The quantitative analysis of three different structural features of promoter regions of six different model systems S. cerevisiae, C. elegans, D. melanogaster, zebrafish, mouse and human has been carried out. Further, the composition of different k-mers (k=3,
4 and 6) A-tracts and G-quadruplexes has been studied.
The analysis allowed us to understand the similarities and differences in struc- tural features of promoter sequences in different model systems. The core promoter sequences of S. cerevisiae, C. elegans, D. melanogaster, zebra fish, mouse and hu- man have been observed to be less stable and have lower preference for nucleosome formation. S. cerevisiae, C. elegans and D. melanogaster promoter sequences have been shown to be less bendable whereas zebrafish, mouse and human promoter se- quences are flexible in terms of bendability towards major groove as predicted fDNase 1 sensitivity model. S. cerevisiae, C. elegans, D. melanogaster core promoter regions have AT rich oligomers, whereas mouse and human core promoter regions have GC rich oligomers and G-quadruplex motifs.
DNA structural features of TATA-containing andTATA-less promoters (chapter 3)Eukaryotic genes can be broadly classified as TATA-containing and TATA-less based on the presence of TATA-box in their promoter sequences. Experiments on both classes of genes have reported that, they have differences in regulation of gene ex- pression and cellular functions. In this chapter, the differences in compositional and structural features of TATA-containing and TATA-less promoters in the above mentioned model systems are discussed. The results suggested that DNA structural features of TATA-containing and TATA-less promoters are distinctly different in all eukaryotes. The TATA-containing promoters are less stable, more flexible and more curved compared to TATA-less promoters in lower eukaryotes. In mouse and hu- man genes, DNA duplex stability and G-quadruplex motifs are very distinguishing features in the two classes of promoters.
DNA structural properties of eukaryotic promoter regions and gene expression variability (chapter 4)
Gene expression is regulated by various external (environment and evolution) and internal (genetic) factors. Presence of sequence motifs, such as TFBSs and TATA- box, as well as DNA methylation has been implicated in the regulation of expression of some genes in vertebrates, but a large number of genes lack these sequences. Ear- lier analyses (described in previous sections) in S. cerevisiae, have shown that their promoter sequences have special structural properties, such as low stability, less bendability and more curvature compared to other genomic regions. These strutural features may play a role in transcription initiation and regulation of gene expression. This project was carried out to understand
1. What is the relationship between DNA structural features and gene expres- sion?
2. What is the relationship between gene expression and bidirectionality of a pro- moter region?
For this purpose, the information of seven different gene expression variability measures, stochastic noise, responsiveness, stress response, trans variability, mu- tational variance, interstrain variation and expression divergence have been com- pared with structural features in the promoter regions. It is observed that a few of the variability measures of gene expression are linked to DNA structural prop- erties, along with nucleosome occupancy, TATA-box presence and bidirectionality of promoter regions. Interestingly, gene responsiveness is shown to be most, inti- mately correlated with DNA structural features and promoter architecture. The study highlights the importance of sequence dependent structural features in gene regulation.
Promoter prediction in eukaryotes using DNA duplex stability (chapter 5)
Structural property-based algorithms can discriminate promoter sequences from non-promoter sequences and are far better than sequence motif-based predictors. Compared to other structural features, low stability is found to be the most preva- lent feature in promoter regions. “PromPredict” (in-house algorithm) uses the din- ucleotide free energy values obtained from differential melting stability of DNA du- plexes as a predictor of promoters and has been successfully used earlier to annotate promoter sequences in prokaryotes and rice. Comprehensive analysis of the perfor- mance of PromPredict in S. cerevisiae, D. melanogaster, C. elegans, zebrafish, mouse and human as well as TATA-containing and TATA-less promoter regions of S. cere visiae with TSS data and 48 eukaryotic systems with translation start site (TLS)
data revealed that differential stability is a good criterion for promoter prediction.
DNA structure in flanking sequences of consensus motifs modulate transcription factor binding (chapter 6) Sequence specific DNA-protein interactions are essential for specific expression pat- terns during the development. There are several factors contribute to DNA-binding specificities of transcription factors (TFs). They include structure and flexibility of TFs, cofactors, chromatin environment and DNA sequence. Along with actual tran- scription factor binding sites (TFBSs), their sequence context (flanking sequences) is also shown to play a major role in gene regulation. Most of the studies have ad- dressed the sequence context at global level but very little is understood about the role of sequences flanking TFBSs in binding of transcription factors.
This project was initiated with the aim of understanding the effect of flanking sequences of TFBSs in transcription factor binding affinity. In vitro DNA binding information of six different transcription factors (with three types of DNA bind- ing domains, Zinc finger (GATA4), home domain (AbdA, AbdB and Ubx) and bZIP (fos-jun and Nfil3)) was provided by Aseem Ansari’s lab. The compositional and structural features (minor groove width, propeller twist, wedge and free energy) are compared with the DNA binding profiles of 12mers (or 8mers) of six different transcription factors. It has been observed that some of the DNA structural proper- ties of flanking sequences are strongly correlated with binding affinity. For GATA4 sequences, binding affinity is negatively correlated to GC content or minor groove width at their 5′ -flanking region, showing the significance of narrow minor groove at 5′ -region. On the other hand, the binding affinity of bZIP proteins is negatively correlated to wedge angles, whereas in case of homeodomain proteins, it is posi- tively correlated to propeller twist and GC content. Thus, this study highlights the differential preference for flanking sequences outside the core binding motifs of six different TFs, which interact with DNA through α-helix.
‘The relationship between transcription pre-initiation complexes and gene ex- pression variability in S. cerevisiae’ is briefly described in the appendix section of the thesis.
General conclusion
Overall, the results presented in this thesis indicate that DNA sequence based structural features are unique to promoter regions and play an important role in gene regulation. Local structural features of flanking sequences of transcription factor binding sites are also instrumental in determining the DNA binding affinity of transcription factors. | en_US |