INTERPIN: identifying INtrinsic transcription TERminators, hairPINs in bacteria
Abstract
The conversion of DNA to RNA through transcription is an important step in the life cycle of every organism. It ensures that the genetic information in DNA is converted through RNA into instructions/blueprints for the formation of functional molecules, such as proteins. The property of RNA to fold onto itself, creating secondary and tertiary structures supports biochemical activities such as catalysis, ligand or protein binding, control of gene expression, protein transport, translation, and other regulatory functions in the cell. For intrinsic termination of transcription, a basic RNA secondary structure needed is known as the hairpin. The hairpin consists of a stem and a loop structure whose location of formation is central to tight and accurate regulation of the transcription process that has an essential role in the cell life cycle. There have been many in silico and experimental studies on the identification and analyses of RNA hairpins. Despite this, termination sites are known only in 30-40% of operons, and the analyses are limited only to a few bacteria including E. coli, M. tuberculosis, B. subtilis, and Streptomyces. All studies propose a single hairpin structure (canonical hairpins) as capable of stalling the RNA polymerase (RNAP) molecule to trigger the termination of transcription. This event constantly competes with the process of RNA elongation and kinetic rates of hairpin folding. Through this work, we have shown that a group of hairpins in a cluster can also work in tandem to cause transcription termination. The size of the hairpin groups could vary and all hairpins are present at ≤15 bases from each other. The two-member cluster hairpins have the highest occurrence and the size distribution decrease exponentially. The overall occurrence of cluster hairpins was found to be higher than single hairpins in prokaryotes (58:42), which can be explained by kinetic and thermodynamic considerations. Our prediction of intrinsic terminators in 13 bacterial genomes across 6 bacterial phyla has matched in 72% of cases when cross-checked against terminator locations inferred from high throughput RNA-seq data. Our method can predict hairpins in both AT and GC-rich genomes. The energy scores of predicted hairpins majorly fall within [-5,5] kcal/mol and lie within 50 base pairs from the stop codon, suggesting efficient termination. We did not find the occurrence of poly U/A pattern as a necessary feature of hairpins to drive the intrinsic transcription termination. Even though the terminator hairpin sequences themselves are not conserved, the process of intrinsic termination is highly conserved which can be explained by subtle differences in the features of hairpins across bacterial phyla.
To disseminate our work, we have archived the results in a public database named INTERPIN, which is the largest collection of intrinsic transcription termination units in bacteria.The database covers 12,745 bacterial genomes, from 10 different bacterial phyla, namely, Firmicutes, Chlamydia, Actinobacteria, Spirochaetes, Planctomycetes, Fusobacteria, Cyanobacteria, Thermodesulfobacteria, Acidobacteria, and Proteobacteria (which is divided into α-, β-, γ-, δ-, ε-, and other proteobacteria), and covering approximately 2.5*107 operons. The database provides a one-stop solution to obtain bacterial intrinsic termination predictions as well as visualize them. We also predicted hairpins in 27,938 bacterial plasmids, and in both cases, hairpins were predicted in >90% of interoperonic regions (IR), out of which cluster hairpins formed ∼58% of the total pool of predicted hairpins. We analyzed these hairpins across bacterial chromosomes and plasmids and found the relationship between hairpin energy and stem/ loop lengths, distance from stop codon, GC content, correlation with the occurrence of cluster/single hairpins, etc., and noted differences and similarities of hairpins across bacterial phyla. We have also delineated alternate transcription termination sites, which add another layer of regulation to prevent read-through of RNA transcripts in case the first default terminator is inadequate in terminating the transcription process. We found these sites in >80% IRs, ranging from 2-10 additional termination sites being present downstream of the first detected intrinsic terminator. The alternate termination sites were found in two categories: first where intrinsic hairpin terminator was found downstream of the first intrinsic terminator and second where intrinsic terminators were found downstream of rho-termination sites. Further, we have exhaustively compared our results with those available from other software like the WebGeSTer which makes similar predictions but uses a distinct hairpin detection approach. Through this analysis, we were able to identify common determinants of intrinsic transcription termination in bacteria, distinguishing features that are conserved across the hairpins predicted by the two software. These include localization of hairpins close to the stop codon (within 50 bp), energy scores between [-20,5] kcal/mol, and non-essentiality of poly U pattern at the hairpin base for effecting transcription termination. The studies were supplemented by validations made against the corpus of experimentally derived intrinsic terminators depicted in the literature. In summary, a new group of terminators, cluster hairpins have been identified and characterized, which is expected to fill the knowledge gap on intrinsic termination sites in bacteria. The prediction provided can help in advancing microbial genome annotations, guide experimental design strategies, and technology development, and provide potential targets for drug discovery studies.