RNA-SHAPE structural analysis using the GenomeLab GeXP system for robust characterization of therapeutic RNA
Mario A. Pulido1 , Jim Thorn2 , Jade Tuck3 and Victoria Smith3
1SCIEX, USA; 2SCIEX, UK; 3Center for Process Innovation, UK
The rapidly evolving field of RNA therapeutics has driven a need for accurate and efficient nucleic acid characterization. Effective characterization is required to allow for the measurement of product-specific attributes such as RNA structural determination to ensure final product stability. This technical note demonstrates the efficiency of using capillary electrophoresis (CE) to address RNA identity and structural stability challenges throughout the life cycle of RNA-based products. A simplified three-day workflow for reliable RNA structural determination using CE and chemical probing delivered robust structural information for a 1,084-nucleotide in vitro transcribed RNA molecule. The selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE) reaction was highly reproducible for a subset of probed nucleotides with a coefficient of variation (%CV) of <5% in the SHAPE reactivity.
In this study, the SHAPE reaction was applied to measure nucleotide structure (Figure 1). 1,2 The mechanistic principle of the SHAPE reaction allowed identification of the status of unconstrained nucleotides along a 1,084-nucleotide singlestranded RNA molecule. This in vitro transcribed RNA is capped using a 5’ cap 1 structure, encodes for a green fluorescent protein (GFP) and has a polyA tail at the 3’ end. The RNA essentially was interrogated for its propensity for nucleophilicity of the 2’-hydroxyl by using the N-Methylisatoic anhydride (NMIA) reagent (Figure 2). Previous studies have demonstrated the consistency of chemical probing and capillary gel electrophoresis (CGE) for high-throughput characterization of RNA molecules.1-3 As a result, the application of the RNA-SHAPE reaction was explored to further support the characterization of relatively long and single-stranded RNA products, demonstrating qualitative and quantitative results for RNA evaluation. This readily available workflow offers another layer of inquiry, verification, and quality control attributes to account for the structural elements of RNA molecules.
Nucleic acid samples: The 1,084-nucleotide mRNA sample was provided by the Center for Process Innovation (CPI) in the UK. The in vitro transcribed single-stranded RNA was synthesized from a DNA plasmid template that encoded for the required molecular architecture to produce mRNA. This included a T7 promoter sequence to initiate transcription, a 5’UTR, the coding sequence for GFP, a 3’UTR and a polyA tail. The DNA sample for sequencing reactions was provided by CPI. Both the mRNA and the DNA plasmid samples were stored at -80°C until time of analysis.
Primers for SHAPE reaction: Each primer set containing an identical nucleic acid sequence was labeled with different fluorophores. Specifically, each primer set contained four reverse primers with the corresponding targeting sequence, and each primer was labeled with Cy5, Cy5.5, Alexa 750 and IR800CW, respectively. The starting positions for the reverse primers to cover the 1,084-nucleotide mRNA transcript were 322, 517, 836, 944 and 1,084, respectively. Consistent with primer design conventions, these primers were ~20 nucleotides with a GCcontent of ~50%. A total of 20 HPLC-purified and lyophilized primers at a 100 nmol scale were obtained from Integrated DNA Technologies (Coralville, Iowa).
Molecular biology reagents: NMIA (PN M25), SuperScript III (PN 18080044), dNTP Mix (PN R0192), Sequenase Cycle Sequencing Kit (PN 78500), Magnesium Chloride (PN AM9530G), Potassium Chloride (PN AM9640G), Tris-ETDA (TE) buffer (PN 12090015), Tris- HCl, pH 8.0 (PN 15568025), Ethanol (PN BP2818), EDTA (PN AAJ15694AE) and nucleasefree water (not DEPC-treated, PN AM9930) were purchased from Thermo Fisher Scientific (Waltham, Massachusetts). Dimethyl Sulfoxide (DMSO, PN D8418), Sodium Acetate, 3M pH 5.2 (PN S7899), Sodium Hydroxide (PN 72068) and Hydrochloric Acid (PN H1758) were purchased from SigmaAldrich (St. Louis, Missouri).
CGE: The DNA sep cap array 33-75B (PN 608087), separation buffer (PN 608012), mineral oil (PN 608114), separation gel (20 mL; PN 391438) and the sample loading solution (SLS, PN 608082) were obtained from SCIEX (Framingham, Massachusetts).
DNA purification: CleanSEQ paramagnetic beads (PN A29151) and the magnetic plate (PN A32782) were obtained from Beckman Coulter Life Sciences (Indianapolis, Indiana).
Preparing the RNA-SHAPE reaction:Preparing the RNA-SHAPE reaction:
RNA conditioning and RNA-2’-O-adduct formation
For each SHAPE reaction, about 12 pmol of RNA was used for conditioning the RNA in a high salt concentration buffer containing magnesium and potassium chloride, as described in a previous publication.3 Next, the conditioned RNA was split into two equal volumes (72 μL per reaction): one was used for NMIA chemical probing and the second was kept as intact or control RNA in the presence of DMSO. Specifically, the RNA-acylation reaction or positive SHAPE reaction providing conditions for RNA-2’-O-adduct formation was prepared by vigorously adding 8 μL of a 25 mM NMIA solution into the 72 μL sample containing the conditioned RNA and allowing it to incubate at 37°C for about 50 minutes. The control RNA or negative reaction was prepared by adding 8 μL of DMSO into the second vial with 72 μL of sample containing the conditioned RNA. The modified and control RNA material were recovered by ethanol precipitation, reconstituted with 5 μL of TE buffer and then stored overnight at -20°C.
Reverse transcription with Cy5 and Cy5.5 primers
A total of 1 μL of 10 μM Cy5 primer was added to the modified RNA (positive reaction) after the 5 μL of reconstituted RNA material was thawed on ice. Similarly, the control RNA (negative reaction) was probed by adding 1 μL of 10 μM Cy5.5 primer to the 5 μL of reconstituted RNA material. As previously described, a 2.5X reverse transcription mixture was added to the positive and negative reactions, respectively, and allowed to synthesize cDNA fragments for 50 minutes at 50°C.3 The positive and negative SHAPE reactions were combined, followed by nucleic acid purification by ethanol precipitation, and then resuspended with SLS solution and kept at 4°C until time for CE analysis.
Sequencing ladders with Alexa 750 and IR800CW primers
To determine the identity of RNA residues enriched for RNA-2’- O-adduct formation, thus indicative of RNA folding, a matching DNA plasmid to the in vitro transcribed RNA was used for DNA sequencing. Two separate DNA sequencing reactions—one using ddC-based termination and the second using ddT-based termination reagents provided in the Sequenase Cycle Sequencing Kit—were prepared by using 2.5 μL of DNA template diluted at 0.04 μg/μL. As described in the manual for the Sequenase Cycle Sequencing Kit, a 12 μL reaction volume enhanced with 10% DMSO, containing 1 μL of the corresponding primer at 10 μM (Alexa 750 for ddC-based termination or IR800CW for ddT-based termination), was mixed with an equal volume (12 μL) of the corresponding termination vial (ddC or ddT). The sequencing reactions were placed in a thermal cycler and amplified using a 46-cycle program with the following steps: 95°C for 30 seconds, 55°C for 30 seconds and 72°C for 1 minute. Amplified products were purified with the CleanSEQ paramagnetic beads as described in the product’s manual. The sequencing ladder samples were stored at 4°C until time of CE analysis.
CGE for SHAPE reactivity profiling
The combined positive and negative reactions were mixed with the corresponding sequencing ladder reactions. For example, a complete SHAPE reaction for CGE analysis was made from a mixture with a final volume of 40 μL containing 45% of the combined positive and negative reaction sample, 20% of the sequencing reaction using the ddC-based termination and 35% of the sequencing reaction using the ddT-based termination. This mixing ratio by volume was determined to be an optimized condition for all primer sets and reactions described for this study. The nucleic acid products were separated by automated CGE, as previously described. Primer crosstalk and mobility-shift correction reactions were performed and analyzed with the GenomeLab GeXP system, as previously described.4 Data files were retrieved from the GenomeLab GeXP system for SHAPE reactivity calculations and RNA structural modeling.
Data analysis and RNA structural modeling
Primer crosstalk adjustment, mobility shift corrections, alignment of the SHAPE reactions to the reference sequence and SHAPE reactivity value calculations were performed using ShapeFinder software. As needed, SHAPE value constrained files were created using Excel and converted to a.txt file. The web servers RNAthor (rnathor.cs.put.poznan.pl) and the ViennaRNA web services (rna.tbi.univie.ac.at) were applied for RNA structural modeling and statistical analysis of RNA-acylation distribution across the RNA fragment.
RNA-2’-O-adduct formation: In support of the RNA structurefunction relationship and its role in effective therapeutics, this results section demonstrates the reproducibility of the RNA structural determination of a relatively large RNA molecule by CGE. As a model system, an in vitro transcribed RNA product with minor matrix influence was examined on the analyte. Future studies assessing RNA and folding in the context of an encapsulating matrix, such as lipid nanoparticles, can provide structural insight into the stringency and stability of RNA products.
Figure 3 illustrates different electropherograms generated by the GenomeLab GeXP system that correspond to four different RNA-SHAPE reactions. As described in the materials and methods section, and as depicted in the overview for RNASHAPE analysis of an extended RNA molecule (Figure 2). The GenomeLab GeXP system demonstrated single-base resolution for nucleotide nucleophilic reactivity with an acylating reagent. Panels A–D in Figure 3 indicate RNA structural folding information for residues 1–864, encompassing the Clean Cap structural element and the coding sequence of the green fluorescence protein. The fifth primer set was applied to evaluate the polyA tail structural element. However, non-specific priming was observed and led to an inconclusive or incomplete alignment of the polyA tail to the reference sequence. While studies have suggested the insertion of RNA sequences flanked by structural elements of an expression cassette, this strategy could result in skewed structural models. This limitation of in vitro transcription design calls for future studies that incorporate benign structural elements as part of RNA final products. Nevertheless, in this technical note, a workflow for RNA structural determination with minimal sample preparation complexity and basic computational requirements using semiautomated software for SHAPE reactivity is demonstrated. The RNA structural models utilizing SHAPE reactivity values can be constructed by using established web servers and optimized by selecting a variety of thresholds and computational parameters.
Panel A shows an electropherogram generated by the GenomeLab GeXP system with nucleic acid products eluting up to 50 minutes. These cDNA products generated by using Primer Set 1 correspond to SHAPE reactivity profiling from nucleotide position 1 up to base 322. As indicated in the electropherogram, four different reactions can be observed, and the blue and continuous peaks indicate the SHAPE positive reactions. The variability of the blue peaks suggests the measurement of unconstrained nucleotides. The green and continuous peaks indicate the background measurement of unconstrained nucleotides of the RNA in the presence of an inert reagent, such as DMSO. The black and red traces demonstrate the presence of sequencing ladders. In this case, the black peaks correspond to G-nucleotides, and red peaks represent A-nucleotides. In combination, the G and A nucleotides are essential to determine the identity of each nucleotide in the SHAPE reaction.
Panels B–C utilized the same sequencing ladder strategy throughout this study. In this case, the black peaks corresponded to G-nucleotides, and the red peaks represented A-nucleotides. As shown in panel B, the cDNA products were resolved up to ~70 minutes, supporting the mapping design shown in Figure 2. This means that Primer Set 2 latched on position 517 of the RNA molecule and extended nucleic acid products closer to the Clean Cap end. Briefly, the panel B electropherogram suggests various hot spots—for example, the relatively high blue peaks for products eluting around 35 minutes, 42 minutes, and 55 minutes—for RNA-2’-O-adduct formation. Regions in the electropherogram that appear flat or with a relatively lower blue peak intensity—for example, the fragments eluting between 45 and 50 minutes or between 60 and ~64 minutes—suggest nucleotides with a lower propensity for nucleophilicity of the 2’- hydroxyl by using the NMIA reagent.
Panel C overall showed a higher degree of nucleophilic reactivity of the RNA-2’-hydroxyl for nucleotides found in the middle of the large mRNA fragment. The Primer Set 3 binding site starting at position 836 produced a relatively large cDNA pool of nucleic acid products, in this case resulting in a coverage of 572 residues. Interestingly, a hard stop was observed for nucleic acid products eluting at about 80 minutes, an observation that was reproduced by fragments generated using Primer Set 4, as shown in panel D, specifically for fragments eluting at about 90 minutes.
In contrast to panels A and B, panel D showed coverage of 548 residues. This panel showed a more significant similarity with the results obtained in panel C. This suggests greater flexibility or propensity for nucleophilic reactivity of the RNA-2’-hydroxyl. Information from individual measurements of nucleotide structure can be used for constructing an RNA model with ShapeFinder software by utilizing these individual SHAPE reactivity or constrained values.
High-throughput analysis of RNA-chemical probing by CGE
Deciphering the reactivity profiling generated by the GenomeLab GeXP system into meaningful SHAPE reactivity values for RNA structural modeling was accomplished using ShapeFinder software. 4 Figure 4 shows a summary of results obtained with ShapeFinder using an input file from the GenomeLab GeXP system. The representative data is based on the SHAPE reaction targeting an RNA fragment starting at position 836, as described in the material and methods. Panel A shows the data used for obtaining a SHAPE reactivity value for each base involved in the SHAPE reaction. Briefly, this analysis shows both peak intensity and position for all peaks in the positive and negative reagent channels, including Gaussian integration calculations linked to the input RNA reference sequence.4 This semi-automated analytical process was performed for the four different SHAPE reactions as described for Primer Sets 1–4 to obtain an Integrated Peaks File with calculated absolute SHAPE reactivities for every peak seen in the separation analysis performed by the GenomeLab GeXP system.
Panel B presents the generated ShapeFinder results in a tabular format that shows the calculated peak position, base identification, and other peak information such as width, area, root mean square (RMS) error for positive and negative reactions, alignment to the reference RNA sequence and net absolute reactivities.4
Translating SHAPE reactivity into RNA structural information
Figure 5 illustrates results obtained by using RNAthor, a computational tool for fast and accurate normalization, visualization and statistical analysis of RNA probing data resolved by CE. 5 Briefly, analyzing the large data set encompassing SHAPE reactivity profiling for up to 864 nucleotides with RNAthor allowed for the confirmation of the validity of the data using an independent analytical tool. Panel A shows four different traces of different lengths and with overlapping features. Overall, the horizontal axis shows the position number of every nucleotide probed and resolved by CGE. The vertical axis shows a normalized SHAPE reactivity value for every base examined for nucleophilicity of the 2’- hydroxyl. In agreement with the experimental strategy for RNASHAPE analysis of a long RNA fragment, as shown in Figure 2, RNAthor reconstructed the RNA molecule with SHAPE reactivity value information. The green trace corresponds to results obtained with Primer Set 1, the red trace shows results using Primer Set 2, the blue trace shows results obtained with Primer Set 3 and the black trace shows results generated with Primer Set 4.
Panels B–D show zoomed-in areas of panel A in support of the reproducibility of SHAPE reactivity patterns across the reconstructed RNA molecule. In panel B, at a location prior to the 200-nucleotide mark, three separate RNA-SHAPE experiments suggest that this location is prone to nucleophilicity of the 2’- hydroxyl, indicated by the overlaying green, red and blue traces. An additional set of overlapping traces suggesting reproducibility of the nucleophilicity of the 2’-hydroxyl can be seen in panel C, immediately after the 350-nucleotide mark indicated by the overlaying red, blue and black traces. Panel D indicates two SHAPE reaction experiments covering similar and unique areas of the RNA fragment, respectively. Notably, the blue and black traces demonstrate accordance at a location prior to the 800- nucleotide mark.
Quantitatively, Table 1 summarizes the robustness of the RNASHAPE reactivity, illustrating its application to quality attribute assessment of RNA-based products. In this case, the four different RNA nucleotides—A, C, G and U—were analyzed at 7 different locations across the long RNA fragment, and they were assessed for the degree of the nucleophilicity of the 2’-hydroxyl by calculating the respective %CV from the number of experiments covering these specific nucleotides. Interestingly, the nucleotide at position 191, identified as cytosine (C) in different experiments, demonstrated a %CV of 8%, suggesting its highly constrained structure and most likely part of a rigid structural element in this long, single-stranded RNA molecule. Residue at position 61 supported the reproducibility of the RNASHAPE reactivity values based on two different experiments with a %CV of 2%. Notably, a %CV of 1% was observed for guanine (G) at position 318 from results obtained from two experiments. In summary, this analytical workflow suggests a variability metric for establishing critical quality attribute (CQA) criteria for RNA products throughout the multiple phases of their life cycles.
Figure 6 provides statistical analysis for RNA-SHAPE reactivity variation for different SHAPE experiments using Primer Sets 2, 3 and 4. These statistical results produced by RNAthor provide substance for the reproducibility of performed experiments or for comparing reactivity profiles of RNA probed in different experimental conditions.5 In summary, panel A presents a boxand-whisker plot, where Fragment 2 shows a mean SHAPE reactivity value of 0.45 and a 50th-percentile of 0.23 for 357 assessed nucleotides. Fragment 3, based on 532 counts or nucleotides, shows a mean SHAPE reactivity value of 0.44 and a 50th-percentile of 0.24. Fragment 4, based on 487 counts or nucleotides, shows a mean SHAPE reactivity value of 0.45 and a 50th-percentile of 0.20.
The violin plots illustrate the distribution of SHAPE reactivity values across these three different SHAPE experiments in a different format. A p-value of 0.07 was obtained from the Kruskal-Wallis test using the same data set that was used in the box-and-whisker plots.
This shows there is consistency among these different RNASHAPE reactions. The violin plots provide a smooth visual for each experiment's range and distribution of SHAPE reactivity.
Constrained RNA structural modeling of a large singlestranded RNA molecule
Panel A in Figure 1 shows an abbreviated workflow for RNA structural analysis by CE that can readily be integrated into R&D or quality control to assess attributes. Panel B in Figure 1 presents RNA-SHAPE reactivity results from nucleotides 1–864 in a single RNA secondary structure model based on constrained pseudo energies. As a result, RNA folding prediction algorithms can be instructed on utilizing practical information or constrained nucleotide values for constructing unique RNA structural models. The legend in panel B in Figure1 corresponds to the range of calculated SHAPE reactivity values categorized into a continuous reactivity gradient. In this case, red values indicate low SHAPE reactivity and blue values indicate high SHAPE reactivity. As noted in the RNA structure model, nucleotides labeled as red were predominantly found in rigid areas or involved in base-pairing, while yellow and orange nucleotides were found in RNA loops, suggesting their high level of flexibility within the RNA molecule. Overall, the RNA secondary structure model in panel B in Figure 1 exhibited the most common folds seen for RNA, such as hairpin, helix, bulging, internal looping and multi-branched looping. Interestingly, the RNA probing server at the ViennaRNA web service utilized the provided structural information to extrapolate the folding of the polyA tail.
Representative sections of RNA folding modeled with RNAthor are shown in panels A–C in Figure 7. As seen in panel A, the user can apply SHAPE reactivity value cut-offs to better understand how these nucleotides play a role in structural folding. In this case, nucleotides with low SHAPE reactivity values were color-coded as black and green, while nucleotides with high reactivity values were labeled red and blue. Panel A illustrates a multi-branched looping, panel B shows the presence of bulging of the Clean Cap AG initiator sequence and panel C shows a helix structure composed of highly constrained nucleotides, predominantly C-G base-pairing.