dc.description.abstract | Heterogenous ribonucleoproteins (hnRNPs) are a class of proteins initially discovered to protect the nascent transcripts before they develop into primary mRNAs. They are involved in almost all the stages of mRNA metabolism, starting from transcription control in the nucleus, alternative splicing, mRNA turnover in the cytoplasm, and post-transcriptional modifications in the matured mRNA. It’s a heterogenous group consisting of about 20 proteins with molecular weights ranging from 34 kDa to 120 kDa. These hnRNPs contain several RNA binding domains (RBD), for example, the RNA recognition motif (RRM) domain, K-homology (KH) domain, and Arginine-Glycine-Glycine (RGG) boxes. They also contain different auxiliary domains i.e. regions rich in glycine, proline, and charged amino acids. The different RBDs and the auxiliary domains create a modularity that adds to the functional diversity of the hnRNPs. hnRNPs also contain long intrinsically disordered regions (IDRs). Sequence analysis showed that more than 50% of the primary sequence of hnRNPs comprises of intrinsically disordered regions. Many of these hnRNPs are known to phase separate because of IDRs. Some of them have been reported to undergo liquid-liquid phase separation (LLPS), and some of them have been found in stress granules or proteinaceous membrane-less organelles (PMO’s), all of which are formed by the principle of LLPS. This thesis focuses on understanding the interactions with nucleic acids and phase separation behaviour of hnRNPK. hnRNPK is primarily a nucleic acid binding protein that has multiple functions inside the cell. It can activate the transcription, by acting as a transcription factor and can also act as a signaling factor. It is known to be involved in developing osteoclasts and osteoblasts, which are part of the skeletal system. As mentioned earlier, it is involved in mRNA protection and is also known to bind to long non-coding RNAs (lncRNAs). hnRNPK binds to ncRNAs and brings about a variety of functions inside the cell, viz., regulating gene transcription inside the cell, regulating nuclear localization of specific lncRNAs, and translation regulation. During cellular stress, p53, the guardian of the genome, triggers a pro-apoptotic pathway that requires the interaction of hnRNPK and a certain lncRNA called lincRNA-p21. This interaction is necessary to mediate transcriptional repression of p53 target genes that lead to apoptosis. A study showed
that hnRNPK and lincRNA-p21 physically interact. Preliminary data showed the 5’ end of lincRNA-p21 interacts with hnRNPK. However, no study has revealed which domain or region of hnRNPK is involved in lincRNA-p21 interaction, and why the 5’ end of lincRNA-p21 is important for this interaction.
A part of the thesis focuses on understanding hnRNPK and lincRNA-p21 interaction, in detail, using biophysical methods. Before we probe the RNA-protein binding, since lincRNA-p21 is more than 3 kb long, we have carried out a detailed bioinformatic analysis of the sequence features of lincRNA-p21. LncRNAs are generally characterized by poor sequence conservation. A functional domain in lncRNAs is marked by sequence and structural conservation in orthologous sequences in different species. Hence, an analysis involving sequence and structure conservation will illuminate the functionally conserved domains of lncRNAs. LincRNA-p21 is ~4 kb in length, situated between Srsf3 and Cdnkn1a/p21 protein coding genes (hence named lincRNA-p21). Using UCSC and Ensembl genome browser, we showed that the whole lincRNA-p21 sequence has poor sequence conservation in non-mammalian vertebrates. We performed BLAT and BLASTN using individual segments of lincRNA-p21 and found few low scoring hits in mammalian orthologs but with no hits in non-mammals. Therefore, we concluded that if there are orthologs sequences in non-primates, only sequence search method is not enough. Hence, we used Infernal, a tool that gives a covariance model based on sequence and secondary structure conservation of RNAs. We showed that lincRNA-p21 has two conserved domains in the 5’ and 3’ end (named domain A and domain B), which are conserved in primates. Using Phylip and EvoNC, we studied the evolutionary rate of lincRNA-p21 compared to its neighboring gene Cdkn1a. Phylogenetic analysis also showed that the two conserved domains have discrete evolutionary dynamics in lincRNA-p21 orthologs with slower rates in primates, likely due to functional constraints. Our analysis revealed that the 5’ end of lincRNA-p21 is one of the conserved domains, thus providing a basis for experimentally probing lincRNA-p21 and hnRNPK interaction.
Through sequence analysis, we identified a conserved C-patch (cytosine patch) motif at the 5’ end of lincRNA-p21 and hypothesized that it could be a putative binding site for the hnRNPK protein. To test this, we have purified hnRNPK protein domains and lincRNA-p21 sequences. We have optimized the expression and purification of hnRNPK-FL protein, KH1+KH2 tandem domain, and KH3 domain of hnRNPK.
Using ITC and NMR, we probed the interaction of hnRNPK with the conserved C-patch region of lincRNA-p21. We have shown that the binding of hnRNPK to the C-patch RNA occurs primarily through the third KH domain. This binding is specific to the C-patch single-stranded loop of the RNA. We used NMR spectroscopy to map the residues of KH3, which are involved in binding to C-patch RNA. It shows that the conserved -GKGG- motif in the variable loop of KH3 between α1 and α2 is involved in binding the conserved heptameric single-stranded C-patch RNA from conserved domain A of lincRNA-p21.
We have also investigated the other sequence features of lincRNA-p21. The thesis reports for the first time that the inverted Alu repeat of lincRNA-p21 contains an RNA motif called SIRLOIN (SINE-derived nuclear RNA LOcalizatIoN)). This motif has been reported to be essential for nuclear localization of Alu repeat containing lncRNAs. We used ITC and NMR to probe the SIRLOIN motif’s (~ 49 mer RNA) interaction with hnRNPK and its KH domains. We have shown that the SIRLOIN motif also binds to hnRNPK through the third KH (KH3) domain.
Apart from understanding the binding of hnRNPK with different RNAs, the thesis also reports the phase separation behavior of hnRNPK. Sequence analysis of hnRNPK shows that it contains an intrinsically disordered region (IDR) between the KH2 and KH3 domains. Detailed analysis showed this region comprises the RG/RGG box, which are shown to be the drivers of liquid-liquid phase separation and binders of higher order DNA structures such as G-quadruplexes in the cell. Using bioinformatic tools like CIDER, IUpred, and FuzDrop, we analyzed the sequence features of hnRNPK, which showed that hnRNPK has a propensity to undergo phase separation. Therefore, we have probed phase separation of hnRNPK in vitro. We have shown that indeed hnRNPK phase-separates in vitro and further characterized the phase behavior under different conditions, for example, pH, salt, etc. To specify the dominant interactions occurring in the phase separated droplets of hnRNPK, we used 1,6-hexanediol, an aliphatic alcohol. Taken together, these results showed that multivalent interactions are involved in the phase separation of hnRNPK. Since hnRNPK is a predominantly RNA binding protein, we examined the phase behavior of hnRNPK in the presence of different RNAs. The results showed that hnRNPK forms well-defined droplets in the presence of specific RNA sequences (C-patch containing RNA and SIRLOIN RNA sequences from lincRNA-p21). Interestingly, non-specific RNAs lead
to the formation of amorphous aggregates of hnRNPK. We also checked whether hnRNPK forms biomolecular condensates in living cells. We used HeLa Kyoto mammalian cell lines for these experiments. We found that under oxidative stress, hnRNPK forms predominantly nuclear condensates. Using 1,6-hexanediol and FRAP (Fluorescence Recovery and Photobleaching) experiments, we proved that the condensates formed by hnRNPK in cells are fluid-like in nature | en_US |