Intrinsic Versus Induced Variations In DNA Structure
Abstract
The binding of different proteins involved in processes such as transcription, replication and chromatin compaction to regions of the genome is regulated by the structure of DNA. Thus, DNA structure acts as the crucial link modulating evolutionary selection of the DNA sequence based on its own function, and the function of the proteins it encodes. The aim of this work is to examine the role of intrinsic, sequence-dependent structural variations vis-a -vis the protein-induced variations, in allowing DNA to assume geometries necessary for binding by proteins. For this purpose, we carried out analyses of datasets of X-ray crystal structures of free and protein-bound DNA, and molecular dynamics simulation studies of few free DNA structures and a protein-DNA complex. Each of the projects described below will appear as a separate chapter in the thesis.
Analysis of X-ray crystal structure datasets
Dataset of high-resolution X-ray crystal structures of free and protein-bound DNA
This project was initiated with the aim of investigating the variation in A-and B-forms of DNA and the role they play in the binding of proteins. However, a survey of the existing literature indicated that the terms ‘A-DNA’ and ‘B-DNA’ were being used rather loosely and several different parameters at the local structural level were being used by various investigators to characterise these structures. Hence a systematic study was taken up to analyse all high-resolution free DNA structures comprising of sufficient number of contiguous Watson-Crick basepairs, irrespective of how they were classified by the existing databases. We also carried out a study of double-helical, Watson-Crick basepaired, free RNA structures for comparison. The structures in the RNA dataset were observed to rigidly assume the A-form and hence the average values of different parameters for that dataset were used to characterise the A-form. The analysis of free DNA and RNA structures was accompanied by an analysis of protein-bound DNA crystal structures. DNA structures bound to the helix-turn-helix motif in proteins were also analysed separately.
The analysis of free DNA and RNA structures allowed us to pinpoint the parameters suitable for discriminating A-and B-forms of DNA at the local structural level. The results illustrated that the free DNA molecule, even in the crystalline state, samples a large amount of conformational space, encompassing both the A-and the B-forms. Most protein-bound DNA structures, including those with large, smooth curvature, were observed to assume the B-form. The A-form was observed to be limited to a small number of dinucleotide steps in DNA structures bound to the proteins belonging to a few specific families. Thus our study highlighted the structural versatility of B-form DNA, which allowed it to take up a range of global geometries to accommodate most DNA-binding protein motifs.
Dataset of X-ray crystal structures of the nucleosome
The study of high-resolution structures of free and protein-bound DNA was followed by an analysis of a dataset of X-ray crystal structures of the nucleosome, which is the fundamental repeating unit of the eukaryotic chromosome, and has been shown to play an important role in transcription regulation. Our results indicated that there is an ensemble of dinucleotide and trinucleotide level parameters that can give rise to similar global nucleosome structures. We therefore raise doubts about the use of the best resolved nucleosome structure as the template to calculate the energy required by putative nucleosome-forming sequences for adopting the nucleosome structure. Based on our results, we have proposed that the local and global level structural variability of DNA may act as a significant factor influencing the formation of nucleosomes in the vicinity of high-plasticity genes, and in determining the probability of binding by regulatory proteins.
Molecular dynamics simulation studies of free and protein-bound DNA structures
The analysis of crystal structure databases was complemented by molecular dynamics (MD) studies to investigate the dynamic evolution of the DNA structure in its free and protein-bound states. The following three simulation studies were carried out:
Study to examine the biological relevance of the presence of 5-methyl group in thymine nucleotides
An investigation of the biological relevance of the 5-methyl group in thymine nucleotides was carried out. For this purpose, comparison of molecular dynamics studies on structures with sequences d(CGCAAAUUUGCG)2and d(CGCAAATTTGCG)2was carried out. Our results showed that the presence of the thymine 5-methyl group was necessary for the A-tract to assume characteristic properties such as a narrow minor groove. It was also shown to modulate local level structural parameters and consequently, the curvature of the longer DNA fragment in which the A-tract was embedded. The analysis also provided possible explanation for the experimentally observed interaction of A-tracts with drugs and DNase-I in the presence and the absence of the thymine 5-methyl group.
This project was the first of a series of MD studies, and hence several protocols were tested before finalising the correct protocol. Simulations were carried out using the Berendsen temperature equilibration scheme as well as the Langevin temperature equilibration scheme on both the structures. The Langevin temperature equilibration scheme was found to be unsuitable for nucleic acid simulations, as it caused long-term and possibly permanent disruption of the double-helical structure at the terminal and the neighbouring two positions in the sequence. The Berendsen temperature equilibration scheme was not observed to cause such disruptions. Simulations were also carried out on both structures, with or without initialising the initial ion positions. The position of minimum electrostatic potential, where AMBER8 placed the first counterion, was observed to act as a minimum energy trap from which the counterion could not escape even during the course of several nanoseconds of simulation. Hence, the actual simulations were carried out using the Berendsen temperature equilibration scheme, and after randomisation of initial ion positions. The results of protocol testing have been reported in an appendix.
Study of DNA bending and curvature
An analysis of DNA bending and curvature was carried out, by MD simulation on structures of three, ∼thirty basepair long sequences, namely, d(G-3(CA4T4G)-C)2, d(G-3(CT4A4G)-C)2and d(T-GACTA5T-GACTA6T-GACTA5T-G). For each molecule, snapshots belonging to a particular global geometry (linear, curved, bent in a particular direction etc.) were grouped together, and the average values of the dinucleotide step parameters for different groups were compared. It was observed that for all the three molecules, the average values for groups corresponding to different global geometries were within 1of each other, indicating that ensemble average values of dinucleotide level parameters are incapable of predicting the global geometry of a DNA molecule.
Study of the TraR-Trabox complex
The study on DNA bending and curvature was followed by simulations of a protein-DNA complex comprising of the bacterial quorum sensing transcription factor TraR with its promoter region known as Trabox. Simulations of a protein-free wild-type Trabox and a Trabox with two mutations in the spacer region were also carried out. Grouping of DNA snapshots in all the three simulations based on average values of dinucleotide parameters in the spacer region shows how selection of the ‘right’ DNA geometry by proteins works at several levels. The number of snapshots of free mutated Trabox assuming a geometry favourable for protein-binding in terms of average twist alone are less than one-fourth of the corresponding number for free wild-type Trabox. When one applies further selection criteria in terms of other parameters such as roll and slide, the number of mutated Trabox snapshots with a geometry favourable for protein-binding drops to less than 0.5%ofthe total number of MD snapshots. Thus our results highlight how sequence-dependent changes in the structrure of DNA regions, adjacent to those that directly hydrogen-bond to proteins, can also critically influence processes such as transcription.
General Conclusion
Overall, our results indicate that intrinsic, sequence-dependent structural variations in free B-DNA allow it to sample a large volume of the double-helical conformational space, and assume global geometries that can accomodate most DNA-binding proteins.