dc.description.abstract | Transcription initiation is an important step in the process of gene regulation in prokaryotes. Promoters are stretches of DNA sequence that are present in the upstream region of transcription start sites (TSSs), where RNA polymerase and other transcription factors bind to initiate transcription. Recent advancement in sequencing technologies has resulted in huge amount of raw data in the form of whole genome sequences. This sequence data has to be annotated, in order to identify coding, non-coding and regulatory regions. Computational tools are useful for a quick and fairly reliable annotation of many genome sequences. Promoter prediction is an important step in genome annotation process which is needed, not only for the validation of predicted genes, but also for the identification of novel genes, especially those coding for non-coding RNA, which are missed by gene prediction programs. DNA sequence dependent structural properties such as DNA duplex stability, bendability and intrinsic curvature have been found to be associated with promoter regions in all domains of life. The work presented in this thesis focuses on the analysis of these structural features in the promoter regions of published prokaryotic transcriptome data. Furthermore, promoters were predicted using these structural features and their role in gene expression were studied. The organization of thesis is as follows. An overview of transcription machinery of prokaryotes, promoter architecture, available promoter prediction programs and sequence dependent structural features is presented in chapter 1.
Chapter 2 describes the datasets and methods used in entire study.
Structural features of promoters associated with primary and operon TSSs of H.pylori26695 genes and their orthologs (chapter 3)
Promoter regions in genomic sequences from all domains of life show similar trends in their structural properties such as stability, bendability, curvature. This chapter dis-cuss the DNA duplex stability and bendability of various classes of promoter regions (based on the identification of different classes of transcription start sites, viz. primary, secondary, internal, operon TSSs etc, in transcriptome study) of Helicobacter pylori 26695 strain. It is found that the primary TSS and operon associated TSS promoters show significantly strong structural features in their promoter regions. DNA free energy based promoter prediction tool PromPredict has been used to annotate promoters of different classes and very high recall values (80%) are obtained for primary TSS. Orthologous genes from 10 different strains of H. pylori show conservation of structural properties in promoter regions as well as coding regions. PromPredict annotates promoters of orthologous genes with very high recall and precision values. DNA duplex stability of promoter region is conserved in the orthologous genes in 10 different strains of Helicobacter pylori genome.
Sequence dependent structural features of promoters in prokaryotic transcriptome (chapter 4)
Next-generation sequencing studies have revealed that a wide range of transcripts such as primary, internal, antisense and non-coding RNA, are present in the prokaryotic transcriptome and a large fraction of them are functionally involved in various regulatory activities. Identification of promoters associated with different transcripts is important for characterization of transcriptome. The current chapter discusses DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes (Helicobacter pylori, Anabaena, Synechocystis, Escherichia coli, Salmonella and Klebsiella). Using these structural features, promoters associated with different category of transcripts were predicted, which constitute an integral part of the transcriptome. Promoter annotation using structural features is fairly accurate and reliable as compared to motif-based approach since different category of transcripts show poor sequence conservation in the promoter region. Most importantly, it is universal in nature unlike sequence-based approach that is generally organism specific.
Role of sequence dependent structural properties in gene expression in prokaryotes (chapter 5)
DNA duplex stability, bendability and intrinsic curvature play crucial roles in the process of transcription initiation. Hence, in order to understand the relationship be-tween these structural features and gene expression, the relative differences in stability, bendability and curvature in the promoter regions of high and low expressed genes were studied. It is found that these features are relatively accentuated in the promoter regions associated with high gene expression as compared to low gene expression. Promoter regions associated with high gene expression are annotated more reliably using DNA structural features, compared to those for low gene expression.
Sequence dependent structural properties in the promoter region of essential and non-essential genes of the prokaryotes (chapter 6)
Essential genes are the minimal possible set of genes required for the survival of organism. These sets of genes can be identified by experiments such as single gene deletion and transposon mediated inactivation. Here, the analysis of DNA duplex stability and bendability in the promoter regions of essential and nonessential genes of prokaryotes is reported. It is found that the average free energy and bendability pro-files are distinct in the promoters regions of essential and nonessential genes. Whole genome promoter predictions using in-house program, PromPredict, for essential and nonessential genes has also been carried out.
Chapter 7 present the summary and conclusion of the entire thesis work followed by future perspectives in the field.
Optimization of PromPredict algorithm and updating PromBase with newly sequenced genomes (Appendix A)
PromPredict is an in-house program, which is based on the relative stability of the DNA in flanking regions. It was found to perform well in predicting promoters across all organisms. In previous studies, it was observed that for organisms having low genomic GC content (<35%), promoter prediction resulted in low precision values, which indicates higher false positive rate. Threshold values of PromPredict algorithm were re-vised in order to optimize the algorithm with low false positive rate. PromBase is a comparative genomics database of microbial genomes. It stores different genomic and structural properties of the microbial genomes. It also displays the predictions obtained from PromPredict in a graphical as well as tabular format. Newly sequenced genomes were downloaded from NCBI and processed using in-house programs and added to the mysql database (back end of the PromBase). Stability profiles for predictions were also added for the RNA coding genes, earlier only profiles for protein coding genes were displayed. Comparative genomics of asymmetric gene orientation in prokaryotes (Appendix B)
Transcription proceeds in 5’ to 3’ direction on the template strand, hence it provides directionality. Prokaryotic genomes show asymmetry in gene orientation on leading and lagging strands. The different phyla of prokaryotes were analyzed in terms of asymmetry in gene orientation. It is found that organisms belonging to a particular phyla known as “Firmicutes”, show high asymmetry in gene orientation, which are known to have different DNA polymerase systems for replication. | en_US |