Structure Function Studies Of Biologically Important Simple Repetitive DNA Sequences
Abstract
The recent explosion of DNA sequence information has provided compelling evidence for the following facts. (1) Simple repetitive sequences-microsatellites and minisatellites occur commonly in the human genome and (2) these repetitive DNA sequences could play an important role in the regulation of various genetic processes including modulation of gene expression. These sequences exhibit extensive polymorphism in both length and the composition between species and between organisms of the same species and even cells of the same organism. The repetitive DNA sequences also exhibit structural polymorphism depending on the sequence composition.
The functional significance of repetitive DNA is a well-established fact. The work done in many laboratories including ours has conclusively documented the functional role played by repetitive sequences in various cellular processes. Structural studies have established the sequence requirement for various non-B DNA structures and the functional significance of these unusual DNA structures is becoming increasingly clear. The structures that were characterised earlier purely from conformation point of view have aroused interest after the recent realisation that these structures could be formed in vivo when cloned in a supercoiled plasmid. The discovery of novel type of dynamic mutations where intragenic amplifications of trinucleotide repeats is associated with phenotypic changes causing many neurodegenerative disorders has provided the most compelling evidence for the importance of simple repeats in the etiology of these disorders. Secondary structures adopted by these simple repeats is a common causative factor in the mechanism of expansion of these repeats. This realisation prompted many investigations into the relationship between the DNA sequence, structure and molecular basis of dynamic mutation. Many experimental evidences have implicated paranemic DNA structures in various biological processes, especially in the regulation of gene expression.
Earlier work done in our laboratory on the structure function relationship of repetitive DNA sequences provided experimental evidence for the role of paranemic DNA structure in the regulation of gene expression. It was demonstrated that intramolecular triplex potential sequences within a gene downregulate its expression in vivo (Sarkar and Brahmachari (1992) Nucleic Acids Res., 20, 5713-5718). Similarly the effect of cruciform structure forming sequences on gene expression was also documented. Sequence specific alterations in DNA structures were studied in our laboratory using a variety of biophysical and biochemical techniques. An intramolecular, antiparallel tetraplex structure was proposed for human telomeric repeat sequences (Balagurumoorthy, et al., (1994) J. Biol. Chem., 269, 21858-21869).
The telomeric repeats are not only present at the end of chromosomes but they are also present at many interstitial sites in the human genome. Database search reveals that the human telomeric sequences as well as similar sequences with minor variations are present at many locations in the human genome. Telomeric repeats are GC rich sequences with the G rich strand protruding as a 3' end overhang at the end of chromosomes. When human telomeric repeats are cloned in a supercoiled plasmid, the C rich strand adopts a hairpin like conformation where as the G-rich strand extrudes into a quadruplex structure. However, the biological significance of these structures in vivo still remains to be elucidated completely. The role of a putative tetraplex DNA structure in the insulin gene linked polymorphic region of the human insulin gene in vivo in the regulation of expression of the insulin gene has been suggested. In this context, we have addressed the question whether the telomeric repeats when present within a gene affect its expression in vivol If so, what would be the possible mechanism? An attempt has been made to understand the effect of presence of telomeric repeats within a gene on its expression. The details of these studies have been presented in Chapter 2 of this thesis.
Contrary to telomeric repeats which provide stability to the chromosomes, recently expansion of a GC rich dodecamer repeat upstream of cystatin B gene (chromosome 21q) has been shown to be the most common mutation associated with Progressive Myoclonus Epilepsy (EPM1) of Unverricht-Lundberg type. Two to three copies of the repeat (CCCCGCCCCGCG)n are present in normal individuals whereas the affected individuals have 30-75 copies of this repeat. The expression of cystatin B gene is reduced in patients in a cell specific manner. The repeat also shows intergenerational variability. The exact mechanism of expansion of this repeat is not known. In the case of trinucleotide repeat expansion, it is shown that the structure adopted by the repeat plays an important role in the mechanism of expansion and that some of the secondary structures adopted by trinucleotide repeats could be inherently mutagenic conformations. In order to understand the mechanism of expansion EPM1 dodecamer repeat, the work reported in this thesis was carried out with the following objectives.
• To understand the structure of G rich and C-rich strands of EPM1 repeat.
• To understand the variations in the structure with the increase in the length and
its possible implications in the mechanism of expansion of EPM 1 repeat.
Studies aimed with these objectives are presented in chapters 3, 4 and 5 of the thesis.
Chapter 1 provides a general introduction to repetitive DNA, the various structures adopted by repetitive DNA sequences in the genome, the functional significance of the various simple repetitive DNA sequences in the genome has
been presented. An account of trinucleotide repeat expansion and associated disorders, non-trinucleotide repeat expansions and associated disorders has been presented. The various non B-DNA structures adopted these repeats and their
implications in the mechanism of expansion have been discussed.
Chapter 2 describes in frame cloning of human telomeric repeats d(G3T2A)3G3 in the N-terminal region of β-galactosidase gene. The effect of such repeat Sequences on transcription elongation in vivo has been studied using E.coli as a model system. The 3.5 copies of human telomeric repeat sequences were cloned in the sense strand of plasmid pBluescriptllSK+ so as to create plasmid clone pSBQ8 and in the template strand of plasmid pBluescriptHKS+ so as to create clone pSBRQ8.
One dimensional chloroquine gel shift assay indicated presence of an unwound structure in pSBQ8 and pSBRQ8. β-galactosidase activity assay suggested downregulation of the gene in vivo. In the case of plasmid pSBQ8 the difference in β-galactosidase activity was approximately 6 fold as compared to the parent plasmid pBluescriptIISK+ whereas in the case of pSBRQ8 the difference in β-galactosidase activity was approximately 8 fold as compared to the control pBluescriptIIKS+. The analysis of β-galactosidase transcript showed that full length transcript was formed in the case of pSBQ8. Full length transcript was not formed in the case of pSBRQ8. We propose that in the case of pSBQ8 the gene expression is inhibited in steps subsequent to transcription elongation. In the case of pSBRQ8, we propose that quadruplex structure may be formed by the template strand at the DNA level thereby blocking transcription elongation step.
Chapter 3 describes studies aimed at understanding the structure of G-rich strand (referred to as G strand) of Progressive Myoclonus Epilepsy (EPM1) repeat. The sequence of the G strand of dodecamer EPM1 repeat is d(GGGGCGGGGCGC)n. Oligoucleotides containing one (12mer), two (24mer) and three(36mer) were synthesised. These oligonucleotides are referred to as dG12, dG24 and dG36 respectively. Structural studies were carried out using CD spectroscopy, UV melting, non-denaturing gel electrophoresis and chemical and enzymatic probing.
The G strand oligonucleotides showed enhanced gel elecrophoretic mobility in the presence of monovalent cations KCl and NaCl. Oligonucleotide dG12 also showed retarded species on non-denaturing gel in the presence of 70mM KCl indicating intermolecular associations. Oligonucleotides dG24 and dG36 predominantly formed intramolecular structures which migrated anomalously faster than the expected size. The CD spectrum for dG12 showed an intense positive band at 260nm and a negative band at 240nm in the presence of KCl indicative of an intermolecular, parallel G quartet structure. The CD spectra of dG24 and dG36 showed 260nm positive peak, 240nm negative peak along with a positive band around 290nm. This is indicative of folded back structure. These findings support the results of non-denaturing gel electrophoresis of G strand oligonucleotides. The UV melting profiles suggested increase in the stability with the increase in the length. These structures were further characterised by PI nuclease and chemical probing using DMS and DEPC.
The structural studies with G-rich strand of EPM1 dodecamer repeat showed that this repeat motif adopts intramolecularly folded structures with increase in the length of the repeat thereby favouring slippage during replication. Chapter 4 deals with the studies aimed at understanding the structure at acidic pH of C-rich strand (referred to as C strand) of Progressive Myoclonus Epilepsy (EPM1) repeat. The sequence of the C strand of dodecamer EPM1 repeat is d(CCCCGCCCCGCG)n. The C rich oligonucleotides are known to form a four
stranded structure called i-motif at acidic pH involving intercalated base pairs. The i-motif consists of two parallel stranded, base paired duplexes are arranged in an antiparallel orientation. Since, the base pairs of one base paired duplex intercalate into those of the other duplex, the structure is called as i-motif. We have investigated structure of C strand of EPM1 repeat by circular dichroism (CD), native polyacrylamide gel electrophoresis and UV melting.
Oligonucleotide dC12 showed two bands of which the major band was retarded on the native gel (pH 5.0) at low temperature suggesting that dC12 predominantly formed intermolecular structure, Oligonucleotides dC24 and dC36 migrated anomalously faster than the expected size indicating formation of compact, intramolecularly folded structures. Circular dichroism studies indicate that, all the oligonucleotides displayed an intense positive band near 285nm, a negative band around 260nm with a cross over at 270nm, This is a characteristic CD signature for an i-motif structure and reflects the presence of secondary structure due to formation of hydrogen bonded pairs between protonated cytosines. All the C strand oligonucleotides showed hyperchromism at 265nm, which is an isobestic wavelength for C protonation. Studies described in this chapter suggest an intramolecular i-motif structure for dC24 and dC36 and an intermolecular i-motif for oligonucleotide dC12.
In addition, it was interesting to note that inspite of the presence of G residues, the stretch of C residues could adopt i-motif structure. Although these structures are formed at an acidic pH, it is indicative of formation of possible intramolecularly folded structure.
Many reports have suggested the possibility of cytosine rich sequences adopting i-motif structure even at neutral pH. In order to test this possibility, structural studies were carried out on the C strand EPM1 oligonucleotides at pH 7.2 in the presence of 70mM NaCl. These studies have been described in Chapter 5. The investigations were done using CD spectroscopy, UV melting, native polyacrylamide gel electrophoresis, and chemical probing using hydroxylamine and PI nuclease. These studies indicate that all the C strand oligonucleotides form intramolecular, hairpin structure at physiological pH. All the three C strand oligonucleotides migrated anomalously faster on the native gel indicating the presence of a compact structure. The CD spectra at pH 7.2 showed a blue shift as compared to those at pH 5.0. This indicated absence of base pairs. The hydroxylamine chemical probing suggested presence of G-C Watson-Crick base pairs. The loop residues of the folded back hairpin structures were probed with PI nuclease. The C strand oligonucleotides showed possibility of formation of multiple hairpin structures with the increase in the length of the repeat.
The propensity to form hairpin structures suggests a possibility of formation of slip loop structures during the replication process thereby promoting expansion of this repeat. Formation of folded back hairpin like structures is significant in terms of mechanism of expansion of this repeat.
Chapter 6 is devoted to concluding remarks highlighting the significance of the experimental results presented in this thesis and their possible biological implications in the light of contemporary research.