Structural analysis of a long single-stranded RNA molecule by chemical probing and capillary electrophoresis (CE) 


RNA-SHAPE structural analysis using the GenomeLab GeXP system for robust characterization of therapeutic RNA

Mario A. Pulido1 , Jim Thorn2 , Jade Tuck3 and Victoria Smith3
1
SCIEX, USA; 2SCIEX, UK; 3Center for Process Innovation, UK

Abstract


The rapidly evolving field of RNA therapeutics has driven a need for accurate and efficient nucleic acid characterization. Effective characterization is required to allow for the measurement of product-specific attributes such as RNA structural determination to ensure final product stability. This technical note demonstrates the efficiency of using capillary electrophoresis (CE) to address RNA identity and structural stability challenges throughout the life cycle of RNA-based products. A simplified three-day workflow for reliable RNA structural determination using CE and chemical probing delivered robust structural information for a 1,084-nucleotide in vitro transcribed RNA molecule. The selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE) reaction was highly reproducible for a subset of probed nucleotides with a coefficient of variation (%CV) of <5% in the SHAPE reactivity.

Introduction


In this study, the SHAPE reaction was applied to measure nucleotide structure (Figure 1). 1,2 The mechanistic principle of the SHAPE reaction allowed identification of the status of unconstrained nucleotides along a 1,084-nucleotide singlestranded RNA molecule. This in vitro transcribed RNA is capped using a 5’ cap 1 structure, encodes for a green fluorescent protein (GFP) and has a polyA tail at the 3’ end. The RNA essentially was interrogated for its propensity for nucleophilicity of the 2’-hydroxyl by using the N-Methylisatoic anhydride (NMIA) reagent (Figure 2). Previous studies have demonstrated the consistency of chemical probing and capillary gel electrophoresis (CGE) for high-throughput characterization of RNA molecules.1-3 As a result, the application of the RNA-SHAPE reaction was explored to further support the characterization of relatively long and single-stranded RNA products, demonstrating qualitative and quantitative results for RNA evaluation. This readily available workflow offers another layer of inquiry, verification, and quality control attributes to account for the structural elements of RNA molecules.

Key features of RNA-SHAPE analysis by CE
 

  • Enables informative RNA structural analysis for characterization of common RNA folding domains, such as hairpin, helix, bulging, internal looping, and multi-branched looping

  • The output data file from the GenomeLab GeXP system can be independently verified by computational tools such as ShapeFinder software and the RNAthor tool

  • Comprehensive RNA molecular identification and structural analysis provides product-specific attribute information throughout the life cycle of RNA therapeutics production

Figure 1. RNA structure analysis guided by constrained nucleotide values. The SHAPE reaction, a type of chemical probing for RNA characterization, is shown. Panel A indicates an abbreviated workflow for SHAPE analysis, and panel B indicates the calculated structure of the 1,084-nucleotide mRNA molecule guided by 864 constrained nucleotides.

Materials and methods


Nucleic acid samples: The 1,084-nucleotide mRNA sample was provided by the Center for Process Innovation (CPI) in the UK. The in vitro transcribed single-stranded RNA was synthesized from a DNA plasmid template that encoded for the required molecular architecture to produce mRNA. This included a T7 promoter sequence to initiate transcription, a 5’UTR, the coding sequence for GFP, a 3’UTR and a polyA tail. The DNA sample for sequencing reactions was provided by CPI. Both the mRNA and the DNA plasmid samples were stored at -80°C until time of analysis.

Primers for SHAPE reaction: Each primer set containing an identical nucleic acid sequence was labeled with different fluorophores. Specifically, each primer set contained four reverse primers with the corresponding targeting sequence, and each primer was labeled with Cy5, Cy5.5, Alexa 750 and IR800CW, respectively. The starting positions for the reverse primers to cover the 1,084-nucleotide mRNA transcript were 322, 517, 836, 944 and 1,084, respectively. Consistent with primer design conventions, these primers were ~20 nucleotides with a GCcontent of ~50%. A total of 20 HPLC-purified and lyophilized primers at a 100 nmol scale were obtained from Integrated DNA Technologies (Coralville, Iowa).

Figure 2. Overview for RNA-SHAPE analysis of a long RNA fragment. A critical step for RNA structural determination by capillary electrophoresis is RNA-acylation by a highly electrophilic reagent, such as NMIA. Reverse primers aligning at specific positions against the RNA target provided acceptable coverage for determination of RNA-acylation sites.

Molecular biology reagents: NMIA (PN M25), SuperScript III (PN 18080044), dNTP Mix (PN R0192), Sequenase Cycle Sequencing Kit (PN 78500), Magnesium Chloride (PN AM9530G), Potassium Chloride (PN AM9640G), Tris-ETDA (TE) buffer (PN 12090015), Tris- HCl, pH 8.0 (PN 15568025), Ethanol (PN BP2818), EDTA (PN AAJ15694AE) and nucleasefree water (not DEPC-treated, PN AM9930) were purchased from Thermo Fisher Scientific (Waltham, Massachusetts). Dimethyl Sulfoxide (DMSO, PN D8418), Sodium Acetate, 3M pH 5.2 (PN S7899), Sodium Hydroxide (PN 72068) and Hydrochloric Acid (PN H1758) were purchased from SigmaAldrich (St. Louis, Missouri).

CGE: The DNA sep cap array 33-75B (PN 608087), separation buffer (PN 608012), mineral oil (PN 608114), separation gel (20 mL; PN 391438) and the sample loading solution (SLS, PN 608082) were obtained from SCIEX (Framingham, Massachusetts).

DNA purification: CleanSEQ paramagnetic beads (PN A29151) and the magnetic plate (PN A32782) were obtained from Beckman Coulter Life Sciences (Indianapolis, Indiana). 

Preparing the RNA-SHAPE reaction:Preparing the RNA-SHAPE reaction:

RNA conditioning and RNA-2’-O-adduct formation

For each SHAPE reaction, about 12 pmol of RNA was used for conditioning the RNA in a high salt concentration buffer containing magnesium and potassium chloride, as described in a previous publication.3 Next, the conditioned RNA was split into two equal volumes (72 μL per reaction): one was used for NMIA chemical probing and the second was kept as intact or control RNA in the presence of DMSO. Specifically, the RNA-acylation reaction or positive SHAPE reaction providing conditions for RNA-2’-O-adduct formation was prepared by vigorously adding 8 μL of a 25 mM NMIA solution into the 72 μL sample containing the conditioned RNA and allowing it to incubate at 37°C for about 50 minutes. The control RNA or negative reaction was prepared by adding 8 μL of DMSO into the second vial with 72 μL of sample containing the conditioned RNA. The modified and control RNA material were recovered by ethanol precipitation, reconstituted with 5 μL of TE buffer and then stored overnight at -20°C.

Reverse transcription with Cy5 and Cy5.5 primers

A total of 1 μL of 10 μM Cy5 primer was added to the modified RNA (positive reaction) after the 5 μL of reconstituted RNA material was thawed on ice. Similarly, the control RNA (negative reaction) was probed by adding 1 μL of 10 μM Cy5.5 primer to the 5 μL of reconstituted RNA material. As previously described, a 2.5X reverse transcription mixture was added to the positive and negative reactions, respectively, and allowed to synthesize cDNA fragments for 50 minutes at 50°C.3 The positive and negative SHAPE reactions were combined, followed by nucleic acid purification by ethanol precipitation, and then resuspended with SLS solution and kept at 4°C until time for CE analysis.

Sequencing ladders with Alexa 750 and IR800CW primers

To determine the identity of RNA residues enriched for RNA-2’- O-adduct formation, thus indicative of RNA folding, a matching DNA plasmid to the in vitro transcribed RNA was used for DNA sequencing. Two separate DNA sequencing reactions—one using ddC-based termination and the second using ddT-based termination reagents provided in the Sequenase Cycle Sequencing Kit—were prepared by using 2.5 μL of DNA template diluted at 0.04 μg/μL. As described in the manual for the Sequenase Cycle Sequencing Kit, a 12 μL reaction volume enhanced with 10% DMSO, containing 1 μL of the corresponding primer at 10 μM (Alexa 750 for ddC-based termination or IR800CW for ddT-based termination), was mixed with an equal volume (12 μL) of the corresponding termination vial (ddC or ddT). The sequencing reactions were placed in a thermal cycler and amplified using a 46-cycle program with the following steps: 95°C for 30 seconds, 55°C for 30 seconds and 72°C for 1 minute. Amplified products were purified with the CleanSEQ paramagnetic beads as described in the product’s manual. The sequencing ladder samples were stored at 4°C until time of CE analysis. 

CGE for SHAPE reactivity profiling

The combined positive and negative reactions were mixed with the corresponding sequencing ladder reactions. For example, a complete SHAPE reaction for CGE analysis was made from a mixture with a final volume of 40 μL containing 45% of the combined positive and negative reaction sample, 20% of the sequencing reaction using the ddC-based termination and 35% of the sequencing reaction using the ddT-based termination. This mixing ratio by volume was determined to be an optimized condition for all primer sets and reactions described for this study. The nucleic acid products were separated by automated CGE, as previously described. Primer crosstalk and mobility-shift correction reactions were performed and analyzed with the GenomeLab GeXP system, as previously described.4 Data files were retrieved from the GenomeLab GeXP system for SHAPE reactivity calculations and RNA structural modeling. 

Data analysis and RNA structural modeling

Primer crosstalk adjustment, mobility shift corrections, alignment of the SHAPE reactions to the reference sequence and SHAPE reactivity value calculations were performed using ShapeFinder software. As needed, SHAPE value constrained files were created using Excel and converted to a.txt file. The web servers RNAthor (rnathor.cs.put.poznan.pl) and the ViennaRNA web services (rna.tbi.univie.ac.at) were applied for RNA structural modeling and statistical analysis of RNA-acylation distribution across the RNA fragment.

Results


RNA-2’-O-adduct formation: In support of the RNA structurefunction relationship and its role in effective therapeutics, this results section demonstrates the reproducibility of the RNA structural determination of a relatively large RNA molecule by CGE. As a model system, an in vitro transcribed RNA product with minor matrix influence was examined on the analyte. Future studies assessing RNA and folding in the context of an encapsulating matrix, such as lipid nanoparticles, can provide structural insight into the stringency and stability of RNA products. 

Figure 3 illustrates different electropherograms generated by the GenomeLab GeXP system that correspond to four different RNA-SHAPE reactions. As described in the materials and methods section, and as depicted in the overview for RNASHAPE analysis of an extended RNA molecule (Figure 2). The GenomeLab GeXP system demonstrated single-base resolution for nucleotide nucleophilic reactivity with an acylating reagent. Panels A–D in Figure 3 indicate RNA structural folding information for residues 1–864, encompassing the Clean Cap structural element and the coding sequence of the green fluorescence protein. The fifth primer set was applied to evaluate the polyA tail structural element. However, non-specific priming was observed and led to an inconclusive or incomplete alignment of the polyA tail to the reference sequence. While studies have suggested the insertion of RNA sequences flanked by structural elements of an expression cassette, this strategy could result in skewed structural models. This limitation of in vitro transcription design calls for future studies that incorporate benign structural elements as part of RNA final products. Nevertheless, in this technical note, a workflow for RNA structural determination with minimal sample preparation complexity and basic computational requirements using semiautomated software for SHAPE reactivity is demonstrated. The RNA structural models utilizing SHAPE reactivity values can be constructed by using established web servers and optimized by selecting a variety of thresholds and computational parameters.

Figure 3. Electropherograms generated by the GenomeLab GeXP system from RNA-SHAPE reactions. A SHAPE reaction is defined by the presence of an RNA molecule treated with an acylating reagent such as NMIA (blue trace), RNA treated with DMSO as the control reaction (green trace) and two distinct sequencing ladders (black and red traces). Panels A–D represent four different SHAPE reactions providing nucleotide structural information for 864 nucleotides. Panel A covers SHAPE structural information starting at nucleotide position 322. Panels B, C and D show cDNA product structural information starting at nucleotide positions 517, 836 and 944, respectively.

Panel A shows an electropherogram generated by the GenomeLab GeXP system with nucleic acid products eluting up to 50 minutes. These cDNA products generated by using Primer Set 1 correspond to SHAPE reactivity profiling from nucleotide position 1 up to base 322. As indicated in the electropherogram, four different reactions can be observed, and the blue and continuous peaks indicate the SHAPE positive reactions. The variability of the blue peaks suggests the measurement of unconstrained nucleotides. The green and continuous peaks indicate the background measurement of unconstrained nucleotides of the RNA in the presence of an inert reagent, such as DMSO. The black and red traces demonstrate the presence of sequencing ladders. In this case, the black peaks correspond to G-nucleotides, and red peaks represent A-nucleotides. In combination, the G and A nucleotides are essential to determine the identity of each nucleotide in the SHAPE reaction. 

Panels B–C utilized the same sequencing ladder strategy throughout this study. In this case, the black peaks corresponded to G-nucleotides, and the red peaks represented A-nucleotides. As shown in panel B, the cDNA products were resolved up to ~70 minutes, supporting the mapping design shown in Figure 2. This means that Primer Set 2 latched on position 517 of the RNA molecule and extended nucleic acid products closer to the Clean Cap end. Briefly, the panel B electropherogram suggests various hot spots—for example, the relatively high blue peaks for products eluting around 35 minutes, 42 minutes, and 55 minutes—for RNA-2’-O-adduct formation. Regions in the electropherogram that appear flat or with a relatively lower blue peak intensity—for example, the fragments eluting between 45 and 50 minutes or between 60 and ~64 minutes—suggest nucleotides with a lower propensity for nucleophilicity of the 2’- hydroxyl by using the NMIA reagent.

Panel C overall showed a higher degree of nucleophilic reactivity of the RNA-2’-hydroxyl for nucleotides found in the middle of the large mRNA fragment. The Primer Set 3 binding site starting at position 836 produced a relatively large cDNA pool of nucleic acid products, in this case resulting in a coverage of 572 residues. Interestingly, a hard stop was observed for nucleic acid products eluting at about 80 minutes, an observation that was reproduced by fragments generated using Primer Set 4, as shown in panel D, specifically for fragments eluting at about 90 minutes. 

In contrast to panels A and B, panel D showed coverage of 548 residues. This panel showed a more significant similarity with the results obtained in panel C. This suggests greater flexibility or propensity for nucleophilic reactivity of the RNA-2’-hydroxyl. Information from individual measurements of nucleotide structure can be used for constructing an RNA model with ShapeFinder software by utilizing these individual SHAPE reactivity or constrained values. 

Figure 4. Summary for results obtained with ShapeFinder software using an input file from the GenomeLab GeXP system. Panel A shows a representative section that demonstrates peak intensity (blue and green traces) and nucleotide position (black and red traces) for all peaks included for analysis. Panel B shows a representative section of the calculated constrained values for each base resolved by the electropherogram generated by the GenomeLab GeXP system.

High-throughput analysis of RNA-chemical probing by CGE

Deciphering the reactivity profiling generated by the GenomeLab GeXP system into meaningful SHAPE reactivity values for RNA structural modeling was accomplished using ShapeFinder software. 4 Figure 4 shows a summary of results obtained with ShapeFinder using an input file from the GenomeLab GeXP system. The representative data is based on the SHAPE reaction targeting an RNA fragment starting at position 836, as described in the material and methods. Panel A shows the data used for obtaining a SHAPE reactivity value for each base involved in the SHAPE reaction. Briefly, this analysis shows both peak intensity and position for all peaks in the positive and negative reagent channels, including Gaussian integration calculations linked to the input RNA reference sequence.4 This semi-automated analytical process was performed for the four different SHAPE reactions as described for Primer Sets 1–4 to obtain an Integrated Peaks File with calculated absolute SHAPE reactivities for every peak seen in the separation analysis performed by the GenomeLab GeXP system.

Panel B presents the generated ShapeFinder results in a tabular format that shows the calculated peak position, base identification, and other peak information such as width, area, root mean square (RMS) error for positive and negative reactions, alignment to the reference RNA sequence and net absolute reactivities.4

Translating SHAPE reactivity into RNA structural information

Figure 5 illustrates results obtained by using RNAthor, a computational tool for fast and accurate normalization, visualization and statistical analysis of RNA probing data resolved by CE. 5 Briefly, analyzing the large data set encompassing SHAPE reactivity profiling for up to 864 nucleotides with RNAthor allowed for the confirmation of the validity of the data using an independent analytical tool. Panel A shows four different traces of different lengths and with overlapping features. Overall, the horizontal axis shows the position number of every nucleotide probed and resolved by CGE. The vertical axis shows a normalized SHAPE reactivity value for every base examined for nucleophilicity of the 2’- hydroxyl. In agreement with the experimental strategy for RNASHAPE analysis of a long RNA fragment, as shown in Figure 2, RNAthor reconstructed the RNA molecule with SHAPE reactivity value information. The green trace corresponds to results obtained with Primer Set 1, the red trace shows results using Primer Set 2, the blue trace shows results obtained with Primer Set 3 and the black trace shows results generated with Primer Set 4.

Panels B–D show zoomed-in areas of panel A in support of the reproducibility of SHAPE reactivity patterns across the reconstructed RNA molecule. In panel B, at a location prior to the 200-nucleotide mark, three separate RNA-SHAPE experiments suggest that this location is prone to nucleophilicity of the 2’- hydroxyl, indicated by the overlaying green, red and blue traces. An additional set of overlapping traces suggesting reproducibility of the nucleophilicity of the 2’-hydroxyl can be seen in panel C, immediately after the 350-nucleotide mark indicated by the overlaying red, blue and black traces. Panel D indicates two SHAPE reaction experiments covering similar and unique areas of the RNA fragment, respectively. Notably, the blue and black traces demonstrate accordance at a location prior to the 800- nucleotide mark.

Figure 5. Automatic RNA-SHAPE data normalization and alignment to the reference sequence. Panel A shows a cumulative step plot with RNA structural information from four different SHAPE reactions (green, blue, red and black traces) covering the 1,084-nucleotide single-stranded RNA molecule. The bottom axis represents nucleotide position, and the vertical axis shows the RNAthor-normalized SHAPE reactivity value. Panels B–D represent zoomed-in regions for selected areas in panel A. The overlapping traces suggest reproducibility of RNA-SHAPE chemical probing resolved by the GenomeLab GeXP system.

Table 1. RNA-SHAPE reproducibility .

Quantitatively, Table 1 summarizes the robustness of the RNASHAPE reactivity, illustrating its application to quality attribute assessment of RNA-based products. In this case, the four different RNA nucleotides—A, C, G and U—were analyzed at 7 different locations across the long RNA fragment, and they were assessed for the degree of the nucleophilicity of the 2’-hydroxyl by calculating the respective %CV from the number of experiments covering these specific nucleotides. Interestingly, the nucleotide at position 191, identified as cytosine (C) in different experiments, demonstrated a %CV of 8%, suggesting its highly constrained structure and most likely part of a rigid structural element in this long, single-stranded RNA molecule. Residue at position 61 supported the reproducibility of the RNASHAPE reactivity values based on two different experiments with a %CV of 2%. Notably, a %CV of 1% was observed for guanine (G) at position 318 from results obtained from two experiments. In summary, this analytical workflow suggests a variability metric for establishing critical quality attribute (CQA) criteria for RNA products throughout the multiple phases of their life cycles. 

Figure 6 provides statistical analysis for RNA-SHAPE reactivity variation for different SHAPE experiments using Primer Sets 2, 3 and 4. These statistical results produced by RNAthor provide substance for the reproducibility of performed experiments or for comparing reactivity profiles of RNA probed in different experimental conditions.5 In summary, panel A presents a boxand-whisker plot, where Fragment 2 shows a mean SHAPE reactivity value of 0.45 and a 50th-percentile of 0.23 for 357 assessed nucleotides. Fragment 3, based on 532 counts or nucleotides, shows a mean SHAPE reactivity value of 0.44 and a 50th-percentile of 0.24. Fragment 4, based on 487 counts or nucleotides, shows a mean SHAPE reactivity value of 0.45 and a 50th-percentile of 0.20.

The violin plots illustrate the distribution of SHAPE reactivity values across these three different SHAPE experiments in a different format. A p-value of 0.07 was obtained from the Kruskal-Wallis test using the same data set that was used in the box-and-whisker plots.

This shows there is consistency among these different RNASHAPE reactions. The violin plots provide a smooth visual for each experiment's range and distribution of SHAPE reactivity. 

Figure 6. Box-and-whisker plot (left) and violin plot (right) illustrating reactivity data distribution for Fragments 2, 3 and 4. Fragment 2 represents the RNA-SHAPE reaction starting at nucleotide position 517, Fragment 3 represents the RNA-SHAPE reaction starting at nucleotide position 836 and Fragment 4 represents the RNA-SHAPE reaction starting at nucleotide position 944. Visual inspection suggests RNA-SHAPE reactivity consistency among these three different reactions.

Figure 7. RNA structural modeling by RNAthor. Panel A shows multibranched looping, panel B shows Clean Cap identification and bulge folding and panel C shows GC-region in helix conformation.

Constrained RNA structural modeling of a large singlestranded RNA molecule

Panel A in Figure 1 shows an abbreviated workflow for RNA structural analysis by CE that can readily be integrated into R&D or quality control to assess attributes. Panel B in Figure 1 presents RNA-SHAPE reactivity results from nucleotides 1–864 in a single RNA secondary structure model based on constrained pseudo energies. As a result, RNA folding prediction algorithms can be instructed on utilizing practical information or constrained nucleotide values for constructing unique RNA structural models. The legend in panel B in Figure1 corresponds to the range of calculated SHAPE reactivity values categorized into a continuous reactivity gradient. In this case, red values indicate low SHAPE reactivity and blue values indicate high SHAPE reactivity. As noted in the RNA structure model, nucleotides labeled as red were predominantly found in rigid areas or involved in base-pairing, while yellow and orange nucleotides were found in RNA loops, suggesting their high level of flexibility within the RNA molecule. Overall, the RNA secondary structure model in panel B in Figure 1 exhibited the most common folds seen for RNA, such as hairpin, helix, bulging, internal looping and multi-branched looping. Interestingly, the RNA probing server at the ViennaRNA web service utilized the provided structural information to extrapolate the folding of the polyA tail.

Representative sections of RNA folding modeled with RNAthor are shown in panels A–C in Figure 7. As seen in panel A, the user can apply SHAPE reactivity value cut-offs to better understand how these nucleotides play a role in structural folding. In this case, nucleotides with low SHAPE reactivity values were color-coded as black and green, while nucleotides with high reactivity values were labeled red and blue. Panel A illustrates a multi-branched looping, panel B shows the presence of bulging of the Clean Cap AG initiator sequence and panel C shows a helix structure composed of highly constrained nucleotides, predominantly C-G base-pairing.  

Conclusion
 

  • Multi-capillary capability provided high-throughput RNA structural analysis with excellent %CV. Skillful analysts could perform up to four SHAPE reactions within a three-day turnaround time. Inconsistencies with sample preparation commonly observed with X-ray crystallography were addressed. Independent experiments or reads showed nucleophilicity of the 2’-hydroxyl with %CV values of 1%.

  • RNA-SHAPE analysis can be used to confirm unique structural elements, such as the presence of the Clean Cap AG initiator sequence.

  • Automatic RNA-SHAPE data normalization and verification by independent software tools, such as RNAthor, can be used to assay reproducibility and robustness.

  • RNA-SHAPE analysis can be used to confirm unique RNA secondary structures, such as hairpin, helix, bulging, internal looping and multi-branched looping.

  • RNA structural information can be extrapolated for RNA identity and structural stability studies throughout the life cycle of RNA-based therapeutics.

  • The RNA-SHAPE reaction provides evidence-based RNA folding, resulting in accurate RNA structural models, which can be informative for matrix effects or delivery technology such as lipid nanoparticles on RNA final products

References
 

  1. Kenyon, J.; Prestwood, L.; Lever, A. Current perspectives on RNA secondary structure probing. Biochem. Soc. Trans., 2014, 42 (4), 1251-1255.

  2. Wilkinson, K.A.; Merino, E.J.; Weeks, K.M. Selective 2’- hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc. 1, 2006, 1610-1616.

  3. Lusvarghi, S.; Sztuba-Solinska, J. et al. RNA secondary structure prediction using high-throughput SHAPE. J. Vis. Exp., 2013, (75), e50243.

  4. Vasa, S.M. et al. ShapeFinder: A software system for highthroughput quantitative analysis of nucleic acid reactivity information resolved by capillary electrophoresis. RNA, 2008, 14, 1979-1990.

  5. Gumna, J. et al. RNAthor – fast, accurate normalization, visualization and statisitical analysis of RNA probing data resolved by capillary electrophoresis. PLoS ONE, 2020, 15 (10), e0239287