Analyses of thousands of ectopically expressed variants demonstrate that this folding both enhances processing and increases mRNA metabolic stability. Even folds with predicted stabilities resembling those of random sequences can enhance processing. Structure-controlled processing can also regulate neighboring gene expression.
Moreover, high-throughput chemical probing of mRNAs from diverse eukaryotes shows that substantially fewer regions are structured in cells than in vitro, in part due to ATP-dependent cellular processes that unfold mRNA structures Ding et al. Even the highly stable RNA G-quadruplexes are globally unfolded in human, mouse, and yeast cells Guo and Bartel, In addition, the functional relevance of the mRNA structures that are supported by in vivo probing remains largely unaddressed; although several mRNA features are associated with folding in cells, the observations remain largely descriptive and correlative Mortimer et al.
Cleavage and polyadenylation of Pol II transcripts are critical for transcription termination, cytoplasmic localization, RNA stability, and translation Tian and Manley, Based on observations made for a few viral RNAs Ahmed et al. Of these 2,, most 1, had no alternative PASs within 10—30 nt of the poly A site, suggesting that most of these distal canonical PASs were functional despite their suboptimal distances from the poly A sites. Each member of the library contained two potential poly A sites.
For each PAS—poly A site distance from 5 to 41 nt, we generated either all possible sequence variants or a sufficient number of variants up to tens of thousands such that sequence-specific effects averaged out Figure 1B. In contrast, the PAS—poly A site distances of the mRNA processed at the query site peaked sharply at 15—18 nt, implying optimal processing efficiencies at these distances Figure 1C , red.
They were also largely consistent with previous mutagenesis studies McDevitt et al. Interestingly, for transcripts with intermediate distances 17—27 , the opposite was observed, implying that structure was slightly disfavored at these distances Figure 1D , which can be explained by the inefficiency of processing when the PAS—poly A site distance was too short [stable structures from arbitrary sequences were likely longer than 14 nt and thus would have more frequently created effective distances too short for efficient processing, according our map of PAS distance constraints Figure 1C ].
To extend our analysis to endogenous mRNAs, we first reanalyzed existing 3P-seq [poly A -position profiling by sequencing] datasets from four human cell lines Nam et al. These regions were then sorted and grouped by PAS—poly A site distance using a group size of 10 nt and a step of 1 nt, and then the predicted pairing probability of each of the nucleotides upstream of the poly A site was averaged across regions in each group and visualized on a heat map Figure 2A.
A diagonal region of low pairing probability transected the map at the positions of the PASs. A striking difference was observed on each side of this diagonal, with PAS-downstream regions top right predicted to be more folded than PAS-upstream regions bottom left. For regions with PAS—poly A site distances between 25—60 nt, two stretches of roughly equally sized folded areas blue were visible, presumably corresponding to the two sides of a folded stem.
B Differences in predicted folding stability observed between PAS-downstream sequences and shuffled sequences red , and between length-matched PAS-upstream sequences and shuffled sequences black , plotted with respect to PAS—poly A distance. Each color depicts the overlay of 1, lines, with each line showing the result of one random shuffling.
The increased fluctuation observed at larger distances reflects the fewer regions with larger distances. To check whether the increased propensity of PAS-downstream regions to fold was simply due to differing nucleotide compositions, we shuffled the PAS-downstream sequences while preserving dinucleotide frequency Jiang et al. For regions with PAS—poly A site distances ranging from 20—29 to 80—89 nt, the PAS-downstream sequences tended to have more stable predicted structures than their shuffled control sequences Figure 2B , red.
In contrast, length-matched PAS-upstream sequences were not predicted to be significantly more stable than their shuffled control sequences Figure 2B , black.
To learn if the structures predicted to form between PASs and poly A sites are folded in vivo, we initially examined structural probing data acquired from human cells, in which intracellular dimethyl-sulfate DMS accessibility was detected by high-throughput sequencing of reverse transcriptase RT stops Rouskin et al. However, the DMS-seq reads corresponding to RT stops near the poly A tail were often too short to map uniquely, resulting in a highly skewed distribution of reads near the poly A sites, which prevented accurate normalization and comparison.
In this hybrid method, we treated the cells with DMS, isolated polyadenylated RNA, fragmented this RNA by partial digestion with Ribonuclease T1 which cuts after G , purified fragments with a poly A tail, reverse transcribed these fragments using a primer that pairs to the beginning of poly A tails, and then sequenced the cDNA Figure 3A. See text for details. The DIM-2P-seq mutation frequencies, which represent intracellular DMS accessibility of A and C residues, are plotted in the bar graph and color-coded on the predicted secondary structure, according to the key.
See also Figure S1. Consistent with known DMS specificity Rouskin et al. At a minimum coverage of reads per sample, we quantified the mutation frequency for 0. Indeed, a structure was predicted to form starting 2 nt downstream of the PAS Figure 3B , although if fully folded, this structure would lead to an effective PAS—poly A site distance of only 6 nt, which would be too short for processing Figure 1C.
Indeed, our probing results indicated that only the distal part of this predicted structure positions 6—21, shaded in gray was folded in cells, as shown by the depletion of mutations at the As and Cs predicted to be paired in this region and the presence of mutations in flanking As and Cs Figure 3B. Folding of this region spanning positions 6—21 would yield an effective distance of 16 nt, which would be optimal for processing Figure 1C. Consistent with a role of the structure in facilitating processing, mutations predicted to destabilize the structure caused a decrease in usage, and compensatory mutations predicted to restore folding stability rescued usage Figure S1B—C.
For example, among the As and Cs predicted to be paired, those in the PAS-upstream region were on average 8. To see if this difference between intracellular folding observed upstream and downstream of the RPS5 PAS occurred more generally, we generated a heat map with the same coordinates as those of Figure 2A but summarizing in vivo DMS accessibility, as measured by positional mutation frequency. Indeed, this DMS-accessibility map had a remarkable resemblance with that of predicted structure Figure 2A.
One difference between the two was the more uniform signal observed between the PAS region diagonal and PAS-upstream region bottom left in the in vivo DMS-accessibility map Figure 3C compared to the more distinct signal observed for the PAS diagonal in the predicted-pairing map Figure 2A. The average DMS-induced mutation rate was 2. Assuming the mutation rate for completely unpaired nucleotides was that of the PAS regions 3. These estimates would have been inflated if protein binding contributed to the protection observed within PAS-downstream or -upstream regions, whereas they would have been dampened if protein binding or RNA structure protected some of the PAS regions.
We also note that RNA structure need not have been uniform to contribute to the protection signal; partial protection of an A or C paired in only half of the mRNAs or during only half of the DMS incubation would have contributed proportionally. These results motivated experiments to explore functional ramifications for the folding of this privileged mRNA region.
This high processing efficiency despite such an extended PAS—poly A site distance is presumably facilitated by a nt stem-loop predicted to reduced the effective distance to 16 nt assuming the stem-loop contributed 2 nt. The predicted stem-loop appears to have been conserved across 62 sequenced mammalian genomes, with 3 of its 11 base pairs undergoing covariation that maintained pairing Figure 4A.
Positions with multiple covariations supporting pairing among the aligned mammalian genomes are highlighted, with alternative pairs and their frequencies listed green. See also Figure S2. D The effect of single-nucleotide substitutions on poly A -site usage, depicted as a sequence logo. The height of each base was scaled by its usage relative to wild type, and bases were stacked, placing the substitutions with stronger effects closer to the x-axis. The sequence and secondary structure bracket notation are shown above.
The logo plot was generated by k pLogo Wu and Bartel, E Example of a mutant—rescue pair. F The relative usage for all 48 mutant—compensatory-mutant pairs left and all 96 mutant—noncompensatory-mutant pairs right in the library. Pairs with usage values inconsistent with rescue are highlighted red. Shown at the top are the ratios of rescueinconsistent: rescue-consistent pairs, as well as the P value for observing at least this number of rescue-consistent pairs, estimated from 10 6 shufflings of the usage measurements.
This library included over 30, variants, with mostly 1—3 mutations, including all single-nucleotide substitutions and most of the double-nucleotide substitutions.
Having confirmed the accuracy and robustness of our measurements, we examined the relationship between variant usage and folding potential. The strongest effect was a G-to-C mutation at position 27 G27C , which resulted in a 5. In contrast, single-nucleotide substitutions in the loop or in wobble or mismatched pairs had less effect, with slight increases in usage for changes predicted to stabilize the structure, including mutations in the first base-pair that converted the G—U wobble to a C—G or U—A base-pair, and mutations that converted the internal C—A mismatch to a U—A or C—G base-pair Figure 4D.
Causality between an RNA structure and a function is conventionally shown using a pair of experiments in which the loss of function caused by a mutation disrupting a Watson—Crick base pair is rescued by a compensatory mutation at the complementary position that restores the Watson—Crick pair.
Indeed, the strong decrease in usage observed for the G27C substitution was rescued by a compensatory mutation at the complementary position C7G Figure 4E. Extending this type of analysis to each of the 48 possible mutation—compensatory-mutation pairs involving the 8 Watson—Crick base pairs of the wild type structure not counting the isolated C—G base pair near the loop showed that 46 supported the hypothesis that Watson—Crick pairing increased the efficiency of poly A -site usage, i.
Nevertheless, the importance of pairing at positions 11 and 13 was supported by each of the other 10 mutation—compensatory-mutation combinations informative for these two pairs. Thus, our saturating mutation—rescue analysis confirmed a causal role for RNA structure in enhancing poly A -site usage. To assess the effect of folding on the processing of endogenous transcripts produced from their native genomic locus, we used Cas9 Cong et al.
The wild-type sequence and the predicted structure bracket notation are also shown bottom , with the expected Cas9 cut site red vertical line. See also Figure S3. Shown are expression values for the mutagenized cells relative to those of wild type cells after mutagenesis for the indicated number of days. Error bars indicate standard deviation based on three technical replicates. The positions of the primer pairs relative to the gene models are shown below the graph.
See also Figure S4. Presumably the detrimental structural consequences of the more moderate deletions offset the favorable effects of these deletions on the PAS—poly A site distance. Interestingly, several mutants for which the deletion removed most of the stem-loop and created PAS—poly A site distances of about 15 nt were processed with efficiencies resembling or exceeding that of wild type Figure 5C.
Although consistent with the model in which the stem-loop structures merely function to reduce effective distance, these results raised the question as to why the stem-loop structures have not been deleted over the course of evolution to achieve the most efficient usage. One potential advantage of the structures is that they provide an opportunity for regulation through factors or conditions that influence stem-loop folding. Consistent with such an influence on neighboring gene expression and the reported role of Spef1 in spermatogenesis Chan et al.
However, differential stability of the processed product would also contribute to its changed accumulation. Nascent RNA was isolated after labeling with 4sU for 30 minutes. The correlation decreased when using less stringent read cutoffs, which was entirely attributable to increased noise in stability measurements Figure S5D—F. D The relative metabolic stability of all 48 mutant—compensatory-mutant pairs left and all 96 mutant—noncompensatory-mutant pairs right in the library; otherwise as in Figure 4F. To examine whether this relationship was indeed causal, we analyzed mutant—compensatory-mutant pairs as in Figure 4F.
Although the results were not as consistent as those observed for relative usage, perhaps due to more experimental variability and smaller effect sizes for the metabolic stability measurements, the number of pairs that supported the role of structure was twice that of pairs that did not Figure 6D left. These analyses of mutant pairs thereby established a causal relationship between RNA structure and mRNA metabolic stability.
One way to reconcile these results would be if structures with predicted stabilities close to those formed by random sequences of the same length and dinucleotide composition hereafter called close-to-random structures were allowed to fold and were functional in this privileged region of the mRNA. In an analysis modeled after that which we performed for endogenous sequences Figure 2C , the usage for each variant was plotted as a function of the probability that its predicted folding was less stable than a shuffled control sequence Figure 7A.
Even for the 7, variants with predicted structures less stable than those of most shuffled control sequences i. The result for the wild type is highlighted red. Probing results are displayed on the predicted secondary structure of the PAS-downstream sequence, as in Figure 3B. Shown at the bottom are the relative usages of a mutant—compensatory-mutant pair, as determined using agarose-gel electrophoresis to resolve RT-PCR products usage values, normalized to that of wild type, shown below gel.
Otherwise, this panel is as in C. Nonetheless, our DMS probing results supported the in vivo folding of the distal stem-loop of each of these predicted structures Figure 7C—D , gray shading. These results demonstrated that the close-to-random structures were functional in each of the genes we tested. More generally, we have made several unexpected observations that challenge the prevailing view of the physiological relevance of structures in mammalian mRNAs.
One unexpected result was the prevalence of functional RNA structures in human mRNAs despite the global tendency of mRNAs to be less folded in eukaryotic cells than they are in vitro Ding et al. Helping to reconcile our findings with the previously reported global tendencies was the unique location of the structures we observed. Their unique location also allows these structures to directly affect an important step of gene expression, i.
How might PAS—poly A site distance influence transcript stability? Importantly, both of these PABPC1-binding scenarios would loop out intervening sequences, which would not only promote folding of these sequences but might also insulate these structures from cellular activities that unfold structures upstream of the PAS. These results were in line with the observations but not the interpretations of previous computational analyses showing that mRNAs or UTRs as a whole are not more stably folded than random sequences Workman and Krogh, ; Clote et al.
In addition, these close-to-random yet functional structures presumably have more relaxed conformational constraints compared to structures of rRNAs, tRNAs, and some other ncRNAs. These ncRNAs must specifically interact with other factors, which presumably drives the formation of highly unique and stable structures that statistically differ from those of random sequences.
In contrast, to reduce effective distance, the precise shape or location of the fold would be less consequential. The same would be true for other possible functions of structure within mRNAs, such as burying cis elements, slowing ribosomes, or reducing exposure to ribonucleases. Perhaps similar approaches, especially when combined with in vivo structure probing and massively parallel mutagenesis and reporter assays, will uncover additional molecular, cellular, or physiological contexts in which structures are frequently allowed to form within mammalian mRNAs and exert influence on other important processes, such as splicing, editing, export, translation, and degradation.
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, David Bartel ude. Each of these cell lines was of female origin. For calculating pairing probability, -p was used instead of -p0. Significance of predicted folding stability was assessed by shuffling the original sequence 10, times while preserving dinucleotide frequency Jiang et al.
Oligonucleotides used to construct and analyze these libraries were purchased from IDT and are listed in the corresponding sections of Table S1. The RT products were amplified using Phusion HF 12—18 cycles and a forward primer that hybridized upstream of the query region. Genomic DNA was amplified using the same forward primer and a reverse primer hybridizing downstream of the poly A site. Sequencing reads with mutations outside the intended region were discarded.
A pseudocount of 1 was added to all raw read counts for calculating usage. To calculate relative usage, usage was normalized to that of the wild type sequence. Similarly, for each CENPB variant, relative usage at steady state was divided by relative usage in the nascent sample, and then this ratio was normalized to that of wild type to yield the normalized metabolic stability. Although this approach for calculating relative metabolic stabilities did not capture complex kinetic behaviors such as deviation from simple exponential decay attributable to either multiple phases of decay or different subpopulations of mRNAs from the same gene, each with different decay rates , for the mRNA from each gene or from each CENPB variant , this approach did provide a single value that represented its overall behavior, effectively weighting the aggregate effects of different kinetic phases and different subpopulations in proportion to their relative contributions to steady-state abundance.
Here, we describe an approach to visualize single nascent pre-mRNA molecules and to measure in real time the dynamics of intron synthesis and excision. Antibody Data Search Beta. Authors: Robert M.
Martin 1 ,. Ana C. Jesus 1 ,.
Maria Carmo-Fonseca 1. Robert M. Full text PDF Related articles. Abstract Microscopy protocols that allow live-cell imaging of molecules and subcellular components tagged with fluorescent conjugates are indispensable in modern biological research. Citations 2 Recent citations: Alexandra C.