Streamlined workflow for untargeted metabolomics 

Employing XCMSplus Software, a simultaneous data processing and metabolite identification software for rapid untargeted metabolite screening

Baljit K. Ubhi1, H. Paul Benton2, Duane Rinehart2 and Gary Siuzdak2
1
SCIEX, USA, 2Scripps Center for Metabolomics and Mass Spectrometry, The Scripps Research Institute, La Jolla, USA

Abstract

XCMSplus software can accelerate metabolomics project time from weeks to days with local processing and batch analysis. With unlimited data storage capacity, robust data processing and intuitive visualizations, XCMSplus offers the complete solution for untargeted metabolomics research.


Introduction

The conventional mass spectrometry metabolomics workflow tends to be a two-step process. Data is collected using MS1 and then processed to find any differential features of interest. Data is then re-acquired and any MS/MS information collected is used for metabolite identification and manual database searching. XCMS software is the “World’s most cited metabolomics software” as it has over 1000+ citations in literature and very trusted in the metabolomics community.

A streamlined workflow using the TripleTOF® System and XCMSplus Software combines both MS and MS/MS data collection into a single-injection workflow and provides simultaneous data processing and metabolite identification. This brings the overall process down from weeks to days, allowing allows for a more efficient and streamlined approach to move the researcher from sample to biology in a timely fashion.

XCMSplus provides a host of features including multivariate statistical analyses and global visualization approaches such as the interactive cloud plot. With improved multi-group analysis capabilities, faster on-site data processing and unlimited storage and customization to their local environment, metabolomic researchers will be able to accelerate their discovery workflows and shorten time translating data into biological information. Data from a previously acquired Zucker rat study (on the TripleTOF system) was processed as proof of concept, through this streamlined workflow using XCMSplus software.
 

Figure 1. Untargeted metabolite workflow using the TripleTOF® System. Collect data using an untargeted approach and process the data using the XCMSplus software. XCMSplus is optimized for untargeted metabolite screening. By combining raw data processing and retention time correction with statistical analysis, the software identifies and quantifies endogenous metabolites that vary between samples.

Key features of untargeted metabolite screening using XCMSplus Software

  • Fast acquisition of high resolution, accurate mass MS and MS/MS data on the TripleTOF Systems enables a single injection data acquisition strategy
  • Data can be loaded, processed and reviewed in a single interactive workspace
  • Streamlined and simplified data extraction parameters
  • Submit and monitor the status of multiple jobs simultaneously
  • Statistical analysis including paired/unpaired t-tests, parametric and non-parametric testing (including FDR), and multivariate data analysis techniques.
  • Simplified metabolite identification by linking the data to a composite database
  • Unlimited data storage capabilities

 

Methods

Sample Preparation: Plasma aliquots from Zucker rats (11 lean, 9 fatty, and 10 obese phenotypes) were obtained from Bioreclamation (Westbury, NY, USA). Animals were fed ad libitum chow which was 18% protein, 6% fat. The plasma was obtained from a terminal bleed at 7-9 weeks old and was preserved only with EDTA and maintained at -80 °C. Protein was removed from the samples by the addition of 8 volumes of cold methanol then centrifuged at 10,000 rpm in a bench-top centrifuge and the supernatant retained.

LC-MS/MS analysis: Using an Agilent 1290 HPLC system and a high strength silica column (Acquity HSS T3 1.8µm, 2.1 x 100mm, Waters, Milford, US) at 60 °C, polar metabolites from a 5 µL injection of serum were separated at a flow rate of 600 μL/min. Full scan TOF MS and MS/MS data was acquired on a TripleTOF 5600+ System in data dependent mode (SCIEX). A pooled sample, used as a QC, was also acquired every 5 samples and used to monitor data reproducibility.

Data Processing: Data was processed in XCMSplus Software in both pair-wise and multi-group job mode. The data files are automatically loaded and read by XCMSplus Software and converted for peak finding and RT correction/alignment. A pairwise job allows the comparison of two groups whereas multigroup job allows 3 or more groups of samples to be analyzed. Any MS data was compared versus the composite database for metabolite identification, MS/MS data were used for confirmation online available METLIN database.2

Workflow Basics

The TripleTOF System is a highly flexible and powerful MS system that can be used for both untargeted and targeted metabolomics as outlined in Figure 1. The workflow for untargeted metabolite screening using XCMSplus Software will be the focus of this work.

XCMSplus Software directly processes SCIEX LC-MS and LC-MS/MS *.wiff data files for feature detection and comparison. With the ability to process multiple jobs all at once, the software provides a single interactive workspace for monitoring job status and for reviewing results. Data can be processed from multiple vendors including Waters, Agilent, Thermo, LECO and Bruker.

Data can be processed in a number of ways which are referred to as “jobs”. A pairwise job allows the comparison of two groups and such paired and un-paired test analysis can be conducted. A multigroup job allows 3 or more groups of samples to be analyzed. As the number of groups is different so is the availability of statistical tests, here a user cannot conduct a one-way test. Both approaches allow principal component analysis (PCA) which is a multivariate analysis technique used to visualize data from multiple groups of samples and multiple variables (or features) in n-dimensional space. A t-test compares one variable at a time in two groups, i.e. tyrosine in the control sample versus tyrosine in the diseased sample.
 

Figure 2. XCMSplus Software - interactive workspace. View the data both pre- and post-alignment/correction (uncorrected TIC/corrected TIC) and where any of the alignment /correction was applied (retention time deviation correction plot).

Efficiency of Data Processing

Forty IDA data files from a TripleTOF 5600 system were loaded into the XCMSplus Software as well as uploaded to XCMS Online for comparison. The data files took around 24 hours to upload and process using the online version – this combines data upload which for these amount of data (~3.23GB) took 4-5 hours, then processing this data took a further 16-18 hours. The same 40 IDA data files were loaded and processed in XCMSplus Software and data loading took around one minute and processing took around 50-60 minutes. Therefore a comparison between the two approaches is as follows:

  • Processing time for XCMS Online: ~24 hours
  • Processing time for XCMSplus: ~1 hour

For cases where results are reviewed and data needs to be re-processed then using the Online approach means allowing another 24 hours. With XCMSplus, this time is far reduced where a user can review reprocessed results just after an hour.

Interactive Workspace with Powerful Statistics

XCMSplus Software outputs a results panel with an interactive workspace (Figure 2) where each extracted ion chromatogram can be reviewed for each significant feature. There is also an array of multivariate reports, from principal component analysis to an interactive heat map with hierarchal cluster analysis. The principal component analysis allows users to define which scaling method is best for the data and see both scores and loadings plots with metabolite annotations. In the interactive heat map, both extracted ion chromatograms (XICs) and spectral plots are given for each feature. This allows for fast, efficient interaction with the data and understanding of the experiment.

Interrogating the LC-MS data from the Zucker rat study, the different phenotypes were observed to cluster together in the PCA Scores plot (Figure 3). The different phenotypes (fatty, lean and obese) are colored to show the sample groups. 
 

Figure 3. Principal component analysis (PCA) scores plot from XCMSplus results. Three phenotypes of rat were analyzed from the Zucker rat study, including a pooled sample (QC). Here the groups of samples can be seen to cluster together on the Scores plot based on that phenotype. In this case, a non-discriminant analysis was performed, meaning no prior knowledge of the groups was used for this visualization.

Working with the Results Table

From the scores plot, the user can “view results table”. The table can be sorted based on p-value significance or q-value significance (Figure 4, top). The q-value is the false discovery rate adjusted p-value. The feature table identifies the retention time at which a particular feature was found as well as the intensity. Other observations (as columns) of interest can be added by the user and used for sorting (such as m/z value or retention time (RT), Corr Var = correlation variation, Max Int = maximum intensity, Feature Gp = feature group). Any isotopes and adducts are also identified as well as feature grouping which groups together any related ions.

From the feature table, a feature can be highlighted and the XIC can be visualized as an overlay from all the samples processed. One can then review the raw data and see if this is an actual feature or just noise or an irrelevant peak eluting in the solvent front. Retention time information as well as the accurate mass is displayed in this XIC plot (Figure 4, middle). The mass spectrum for that XIC can be viewed as well as a simple box and whisker plot highlighting the differences between the groups. Finally, each feature is matched for identification to the composite database. If MS/MS spectra are available, then the column labelled “MS/MS” is populated with “y” meaning MS/MS is available. The database identifications are listed in a table (Figure 4) and ranked by m/z error (ppm) and then alpha/numerically. Then using METLIN website (https://metlin.scripps.edu) any MS/MS confirmation could be made.
 

Figure 4. View results table page. This page includes the feature table (top) which outlines all features picked in the peaking finding step during processing. Features can be ordered by any column such as p-value or q-value, etc. A feature can be highlighted and the XIC information across samples, the area differences between samples (Box and Whisker plot) and the m/z information can be viewed (middle). Finally, identification information (bottom) is shown.

Interactive Cloud Plot

The most compelling visualization tool in XCMSplus software is the interactive cloud plot (Figure 5). Key visualization features include:

  • P-value is represented by how dark or light the color is.
  • Fold change is represented by the radius of each feature.
  • Retention time is represented by position on the x-axis.
  • Mass-to-Charge ratio is represented by position on y-axis.
  • Sliders for p-value and fold change are in the Main Panel.
  • Sliders for intensity, retention time, and mass-to-charge are in the Advanced Tab. A link to the table representation of the graph is displayed at the bottom along with the settings used to generate the graph.

The cloud plot is completely interactive and can be filtered on any function listed above. So more stringency can be used for the p-value and also the mean fold change allowing for only the more highly significant features to be displayed.

The end results from XCMSplus mean a confirmed list of significant metabolites. From the Zucker rat study it was observed that there were many lipids changes amongst the three phenotypes studied (lean, fatty and obese) as well as other small metabolites including bile acids.

Figure 5. Interactive cloud plot. The cloud plot displays features whose intensities are altered between sample groups. In this case 1357 features where highlighted as having a p-value of less than 0.01. Up-regulated features are represented as circles on the top of the plot and down-regulated features are represented as circles on the bottom of the plot, where the size and the degree of color saturation corresponds to the (log) fold change of the feature. Circles with black outlines indicate hits in the database. The lighter or darker the color of the circle relates directly to the significance of the p-value. The cloud plot is completely interactive, where the user can filter p value, m/z value, as well as retention time to allow the viewer to see more/less features based on the filtering criteria.

Conclusions

XCMSplus software can accelerate project completion time from weeks to days with local processing and batch analysis. With unlimited data storage capacity and the security of a local desktop package, XCMSplus offers the complete solution for untargeted metabolomics research. Beyond the robust data processing and visualization features of XCMSplus, virtually the entire metabolomics community can privately (and publicly) share their data and results within the XCMSplus software. 


References

  1. Benton H.P. and Ivanisevic J. et al. (2015) Autonomous metabolomics for rapid metabolite identification in global profiling. Analytical Chemistry, 87(2): 884-91.
  2. METLIN database.
  3. Advanced data processing software enables autonomous metabolite identification on the X500R QTOF System. SCIEX technical note RUO-MKT-02-6659-A.