Analysis of untargeted metabolomics data from an untargeted Zeno data-dependent acquisition (DDA) workflow using SCIEX OS software


A metabolomics data processing pipeline for DDA data acquired on the ZenoTOF 7600 system

Rebekah Sayers1, Robert Proos2 and Paul RS Baker2
1SCIEX UK; 2SCIEX US

Abstract


This technical note describes the best practices for analyzing untargeted metabolomics data acquired using the ZenoTOF 7600 system. Previous work has provided a step-by-step overview of instrument parameter settings to obtain optimal untargeted data1. This work explains the settings and the subtle changes to the processing method in SCIEX OS software that are needed to maximize coverage of the metabolome. 

Introduction


Metabolites are chemically and structurally diverse compounds that are found endogenously at a wide dynamic concentration range, which makes the metabolome challenging to characterize.

Figure 1. Targeted and untargeted metabolite screening workflows on the ZenoTOF 7600 system. 

Mass spectrometry is typically used for metabolomics analysis in complex biological samples. DDA is a commonly used analytical technique for untargeted metabolomics to detect and potentially quantify all metabolites in a sample (Figure 1). In this experiment, a TOF MS scan detects all precursor ions within a specified mass range, and MS/MS scans are triggered for those precursors that pass pre-defined criteria. Based on adjustments to the criteria, an ideal experiment should provide as much coverage of the metabolites as possible, while maintaining good MS/MS spectral quality. A complementary technical note detailing the optimal instrument parameter settings for DDA analysis using the ZenoTOF 7600 system is available online1.

Many metabolites are present at low endogenous concentrations due to the nature of the metabolome, which can lead to lowquality spectra and poor coverage. However, the Zeno trap provides significant gains in MS/MS sensitivity and improves MS/MS spectral quality. The use of the Zeno trap can therefore result in higher confidence spectral matching and greater metabolite coverage. Spectral matching is currently reliant on libraries obtained from instruments without a Zeno trap. Therefore, the processing methods must be optimized to realize the true potential and impact of the data. 

Key features of processing Zeno DDA data using SCIEX OS software 
 

  • An untargeted metabolomics workflow with the Zeno trap enabled provides high-quality MS/MS spectra

  • Current MS/MS spectral libraries contain fragmentation data acquired on platforms without a Zeno trap, which differ in the ratio/intensity of fragment ions

  • To maximize the impact of metabolomics data acquired on the ZenoTOF 7600 system, specific parameter settings within the data processing method should be used

Methods


Sample preparation: NIST SRM 1950 plasma was extracted using 4 volumes of ice-cold methanol. After centrifugation to separate the precipitated proteins, the supernatant was directly analyzed by liquid chromatography electrospray ionization tandem mass spectrometry (HPLC-ESI-MS/MS).

Chromatography: Samples were analyzed using an Exion LC system with a Kinetex F5 column (2.1 × 150 mm, 2.6 µm, Phenomenex). A simple linear gradient from 0 to 95% B was used with standard reversed phase mobile phases at a flow rate of 200 µL/min. Mobile phase A was 0.1% formic acid in water and mobile phase B was 0.1% formic acid in acetonitrile. A 1 µL injection volume was used and the column temperature was maintained at 30°C throughout the analysis. The total runtime was 20 min.

Mass spectrometry: The extracted sample was analyzed on the ZenoTOF 7600 system equipped with the OptiFlow Turbo V ion source. Data were collected using a top-40 DDA method, with dynamic background subtraction (DBS) and exclusion for 6 s after 3 occurrences. The method used a TOF MS accumulation time of 100 ms, collision energy (CE) of 30 V and a TOF MS/MS accumulation time of 5 ms. For a detailed explanation of parameter settings used for DDA analysis on the ZenoTOF 7600 system, see reference 1.

Data processing: All data were analyzed using the Analytics module in SCIEX OS software. The MS/MS spectra were compared to the library spectra using the SCIEX All-in-One HRMS/MS library version 2.0, SCIEX Accurate Mass Metabolite Spectral Library version 2.0 and NIST 2017 MS/MS library. Statistical analysis was performed using MarkerView software. 

SCIEX OS software: Project Default Settings


Quantitative processing parameters (peak integration) 

SCIEX OS software can be used to process all data types acquired on both triple quadrupole and QTOF instruments, and processing parameters can be adjusted depending on the type of analysis and data processing used (Fig. 2). SCIEX OS software has 3 integration algorithms that can be used to integrate peaks. These options are listed in the Project Default Settings in the Analytics module of SCIEX OS software (Fig. 3).

MQ4: This algorithm selects a low, but not the lowest, concentration standard or quality control sample as the representative sample for the analytical run. This algorithm requires some user input to define the integration parameters but can enable faster searching. This algorithm is therefore preferentially used for targeted and untargeted identification workflows.

AutoPeak: This algorithm selects a high, but not saturated, concentration standard or quality control sample as the representative sample of the analytical run. Chromatograms within the batch being processed are evaluated to determine which sample is the best peak model for each transition. The peak model is constructed based on a model of 3 Gaussian peaks. This algorithm requires little user input because few parameters are adjustable. This is the optimal choice for targeted quantitative analysis.

Summation: This algorithm does not perform a normal peak search and instead assumes that a peak is present close to the expected retention time. This algorithm works well with flowinjection analysis (FIA) or infusion data. 

When performing quantitative mass spectrometry data processing, it is important to determine whether a given peak is significant and exceeds background noise. There are 3 signal-tonoise algorithms available in the software, including relative noise, standard deviation and peak-to-peak. The latter 2 require the noise region and the peak of interest to be selected on the chromatogram and are therefore not suitable for untargeted data analysis. The relative noise algorithm provides a less subjective means of determining signal-to-noise calculations and allows a background to be calculated even when no peak-free part of the chromatogram is available.

For targeted and untargeted metabolite screening, the MQ4 integration and relative noise algorithms should be selected. 

Figure 2. Process for defining processing parameters in SCIEX OS software, according to workflow

Figure 3. Project default settings. Top: Peak integration and signal-tonoise algorithms and Bottom: library searching algorithm, filtering and scoring settings.

Qualitative processing method parameters 

The quantitation and targeted identification workflow can be used for both the quantitative and qualitative analyses of known analytes and uses library searching to confirm the identity of analytes in the sample. There are 3 types of library searches that can be performed. 

Candidate search finds the best spectral match from all compounds in the selected libraries. This search will take the longest time to process. 

Confirmation search uses the name in the components list to match against compound names in the libraries. The name in the XIC list must exactly match the name in the library or no hit will be found. This should only be used in a targeted identification search, where the target list has been generated from the library compound list.

Smart confirmation search first searches for matching names, and then searches using the spectra if no match is found. Smart confirmation search is recommended for this workflow as the processing time will be the same as the confirmation search if the compound name exists in the library. This search strategy comes with the added flexibility to perform candidate searching when needed. This is the most used search algorithm.

Once the results have been obtained from the library search, it is important to assess the confidence of each of the identifications. There are 3 scores can be computed to sort the results.

Purity score uses all peaks from both spectra when calculating the score. A high value indicates a high probability that the unknown spectrum has been correctly identified and does not contain significant amounts of peaks from additional compounds. A lower value indicates that the match is less certain or additional fragment ion peaks from another compound are present in the unknown spectrum. This is the most used method to sort results. 

Fit score is calculated based on the library spectrum and it ignores peaks that are present only in the unknown spectrum. The fit score describes how well the library spectrum is represented in the unknown spectrum. A high fit score with a low purity score indicates that the unknown spectrum is likely impure but contains library compounds.

Reverse fit score is calculated based on peaks that are present in the unknown spectrum and it ignores peaks present only in the library spectrum. The reverse fit score describes how well the unknown spectrum is represented in the library spectrum.

Default settings for quantitative processing and peak integration should be set for each project and cannot be altered within the processing method. Qualitative processing method parameters can be set for the project and can also be modified in the processing method.

SCIEX OS software: Targeted and nontargeted metabolite screening 


When performing untargeted data analysis, select either the quantitation and targeted identification or non-targeted screening workflow. The selected project default parameters are used for peak integration and are saved in the processing method file.

For targeted identification, a component list must be entered. This can be inputted manually, imported from a text file or generated from a library database. Inputting the chemical formula and selecting the adduct/charge will automatically calculate the precursor mass (Da). The peak integration will then be performed using an example data file. This can be reviewed before proceeding or found later in the results table.

If non-targeted screening is used, a component list can be used if desired but the non-targeted search parameters must be defined. If the processing method contains the targeted analytes, then the customized integration parameters for the targeted components will not affect the non-targeted peak integration. To change the project default parameters, the user must create a new non-targeted method. If the parameters are changed in an existing method, the changed parameters will not be implemented. 

Library search parameters

Library search parameters have a big impact on the coverage and the quality of spectral matching (Fig. 4). Precursor mass tolerance, collision energy, retention time and polarity settings can be used to filter the data. Precursor mass tolerance should be set according to data type. For accurate mass data acquired on a high-resolution instrument, the precursor mass tolerance should be set to 0.02 Da. Unchecking the Collision Energy and Collision Energy Spread boxes will allow more spectra to be returned. Using CE and CES is recommended if the experimental conditions are the same as those used to build the library, as this will return the most reliable results. Retention time (RT) should only be selected if the search is against a library that includes RTs and if the same chromatographic separation conditions are used. Metabolomics data are typically acquired in both polarities, so selecting this filter will allow library components to be matched correctly to data acquired in the same polarity. 

Figure 4. Library search parameters for DDA data acquired on the ZenoTOF 7600 system

These settings will determine which spectra in the library will be evaluated against the acquired spectra. Good filtering will improve processing time and reduce false positives. Incorrect filtering will cause false negatives. The remaining parameters will affect how the results are scored. Fragment mass tolerance, intensity threshold, minimal purity and intensity are important to consider when comparing data to a library acquired without using the Zeno trap because of differences in the intensities of fragments. 

Intensity threshold sets which fragment ions are used in the search. The default 0.05 value corresponds to 5% and any peak below 5% of the base peak will be ignored. This value can be used to remove small noise peaks from the spectrum and improve purity scores. Decreasing this will allow more of the smaller peaks to be considered during spectral searching, which might be important to identify a compound. For processing our Zeno DDA data, this number is set to 0.025%.

Minimal purity indicates how well the sample and library spectra match. All peaks from both spectra are used. High values indicate a higher likelihood that the unknown spectrum has been correctly identified and that it does not contain significant amounts of peaks from additional compounds. Lower values indicate that the match is less certain or that fragment ion peaks from another compound are present in the unknown spectrum. Increasing the value of this setting reduces the frequency of lower quality hits. 

Intensity factor compensates for differences in peak height between the unknown spectrum and the library spectra. The larger the intensity factor, the greater the difference allowed between the unknown spectrum and the library while still getting a high score. Increasing this value will remove the significance of relative intensity. When comparing spectra generated with the Zeno trap, the intensity factor should be set to 20 to reduce the importance of variation in the ratio of fragment ions between the unknown and the library spectra. Increasing the intensity factor also increases the purity and these settings change the score calculation. A low intensity factor can yield low scores for true hits, whereas a large intensity factor might introduce false positives. 

Figure 5. Confidence level settings. Found under the Flagging Rules tab, these settings enable easy identification of compounds in the results table that may require further investigation before assigning identity

Confidence level settings – Flagging rules

Confidence level settings establish the acceptable boundaries by which data is matched to the spectral libraries and are used to filter any proposed metabolite identifications and characterize the quality of the match. These levels can be set in the Flagging Rules tab (Fig. 5). A score is calculated for the sample metabolite for all criteria selected. A combined score can be generated based on weighting all parameters. The weighing of each qualitative rule can be adjusted according to the number of rules but should total 100%. The results table can be filtered using this traffic light system in which metabolites that fall within the defined confidence intervals will be highlighted in green for acceptable, yellow for marginal and red for unacceptable. 

Qualitative rules include mass error (ppm), fragment mass error (ppm), error in retention time, % difference in isotope ratio, library hit score and formula finder score. When performing targeted identification, all but the fragment mass error qualitative rule can be selected. To include a fragment mass error, it must be entered in the components list. 

Formula finder

The formula finder algorithm tries to predict the possible chemical formula based on the MS and MS/MS spectra, precursor mass accuracy, isotopic pattern and MS/MS fragmentation.

Formula predictions are made from a defined list of elements and mass tolerance that can be found under the Advanced Tab (Fig. 6). First, select from either naturally occurring compounds or synthetic compounds. In either case, the type of element and/or number of elements to consider must be entered in the Limits section. Proposed formulas are scored based on precursor mass accuracy and average MS/MS mass accuracy of matching fragments. The MS spectrum contributes 67% to the final formula finding score and the MS/MS spectrum contributes 33%. As a result, the ability of the formula to predict the MS mass is the primary influence on the score. However, the matching of the MS/MS fragments also influences the score. The isotope pattern is used to generate the list of found formulas but it is not used to generate the final score. Therefore, a formula with the wrong isotope pattern will probably not be included in the list. A list of possible formulas is determined using precursor mass accuracy, isotopic pattern and MS/MS fragmentation. A high formula finding score does not guarantee that the sample compound is the one identified by the formula finding algorithm because several formulas often match within the mass error. Care must be taken and other confirmatory testing must be done before a compound is identified using formula finding.

Fig 6. Formula Finder settings. An empirical formula can be predicted based on the TOF MS and TOF MS/MS spectra acquired for DDA experiments. These predictions are influenced by the limits on the elemental composition and mass tolerance settings

SCIEX OS software: Results


After defining the required algorithms and processing parameters discussed above, a results table will be generated (Fig. 7). Extracted ion chromatograms (XICs) are generated for all metabolites in the library based on thresholds set by the user, such as formula and expected retention time of all target analytes. The MS and MS/MS information is automatically evaluated if the detected XIC signal exceeds the user-defined intensity threshold or signal-to-noise (Figure ). Data processing results are ranked based on 4 selectivity criteria to provide a high degree of confidence in assigning compound identifications to detected compounds:

  • Retention time matching 
  • Mass accuracy 
  • Isotope pattern fit 
  • MS/MS library searching

In addition, the peak intensity of the analyte can be compared to that of a standard sample of known concentration to obtain quantitative information. 

Formula finder results are shown in Figure 7. Clicking on Formula Finder Results will show additional potential candidates. The chemical structure of the selected formula finder results is also shown in the table if the compound has been updated from ChemSpider. By clicking on the ChemSpider icon in the results table, a ChemSpider session is initiated, and a list of all suggested compounds that match the selected formula will be generated. Selecting a compound in the results list in the app will display the chemical structure, acquired vs matched spectra and a table of matching fragments, together with their mass errors and elemental composition

Figure 7. Results table generated from a targeted identification workflow in SCIEX OS software. Results have been filtered to show only those meeting the highest confidence for each criterion. 

MarkerView software: Statistical analysis


The MarkerView software is designed to allow the data from several samples to be compared so that differences can be identified. The program uses multivariate analysis (MVA) techniques to compare the samples and provides both supervised and unsupervised methods (Figs. 8 and 9). Supervised methods use prior knowledge about the sample groups (for example, healthy vs diseased) to determine the variables that distinguish the groups. In contrast, unsupervised methods allow the structure within the data to be determined and visualized. The 2 approaches can be combined, for example, as unsupervised methods can be used to determine the groups and then supervised methods can be used to confirm the important variables. 

The first stage of data analysis is to perform an unbiased review of the detected features. Using the wizard, the raw data files (*.wiff or *.wiff2) are imported into the software without any prior processing. Data can be assessed by performing a PCA analysis. The resulting scores and loading plots enable us to check data reproducibility and quickly identify any outliers. A ttest between groups of interest can show differences between individual features across samples. A volcano plot of p value vs fold-change highlights any significant changes across the samples. These metabolites can be added to an interest list. These processing steps are used to identify putative features of interest. Data must then be processed in SCIEX OS software using a spectral library to identify the selected features. 

Figure 8. Unsupervised PCA Analysis. Panel A: Two groups of data, for example treated Vs. control samples can be processed via PCA analysis to identify outlier metabolites in the treated group from the loadings plot. These data can be visualized by highlighting the data point(s) form the loadings plot and to generate either a profile plot or a box and whiskers plot as shown in Panel B. 

Once the features have been identified, a second statistical analysis using the peak list of identified metabolites can be performed. When the peak list is imported into MarkerView software, the names of the metabolites are shown instead of the feature’s m/z and RT. PCA analysis and t-tests on our sample sets. The resulting volcano plot shows which metabolites have the largest fold-change and which are significantly different between the samples. This approach can enable fast identification of features of interest, which may reveal important biological insights.

Figure 9. Volcano plot data visualization form MarkerView software. Data can be visualized via a volcano plot (Panel A), and as in figure 8, highlighted data can be visualized quantitatively as a profile plot or a box and whiskers plot as shown in (Panel B).

Conclusion
 

  • High-resolution, accurate mass MS/MS data acquired on the ZenoTOF 7600 system at a fixed collision energy may not match as confidently to spectral libraries acquired without the Zeno trap. Herein, optimized parameter settings are provided

  • SCIEX OS software allows targeted identification and quantitation and untargeted screening of metabolites. This high degree of flexibility enables the user to optimize processing parameters for a given workflow and/or data type.

  • Algorithm and processing parameter choices will impact how many metabolites are confidently identified when comparing Zeno DDA data with conventional spectral libraries

  • MarkerView software together with SCIEX OS software provides a complete solution to a metabolomics workflow

References
 

  1. Baker, PRS and Proos R. Untargeted data-dependent acquisition (DDA) metabolomics analysis using the ZenoTOF 7600 system. SCIEX technical note, RUOMKT-02-15367-A