Employing XCMSplus Software, a simultaneous data processing and metabolite identification software for rapid untargeted metabolite screening
Baljit K. Ubhi1, H. Paul Benton2, Duane Rinehart2 and Gary Siuzdak2
1SCIEX, USA, 2Scripps Center for Metabolomics and Mass Spectrometry, The Scripps Research Institute, La Jolla, USA
The conventional mass spectrometry metabolomics workflow tends to be a two-step process. Data is collected using MS1 and then processed to find any differential features of interest. Data is then re-acquired and any MS/MS information collected is used for metabolite identification and manual database searching. XCMS software is the “World’s most cited metabolomics software” as it has over 1000+ citations in literature and very trusted in the metabolomics community.
A streamlined workflow using the TripleTOF® System and XCMSplus Software combines both MS and MS/MS data collection into a single-injection workflow and provides simultaneous data processing and metabolite identification. This brings the overall process down from weeks to days, allowing allows for a more efficient and streamlined approach to move the researcher from sample to biology in a timely fashion.
XCMSplus provides a host of features including multivariate statistical analyses and global visualization approaches such as the interactive cloud plot. With improved multi-group analysis capabilities, faster on-site data processing and unlimited storage and customization to their local environment, metabolomic researchers will be able to accelerate their discovery workflows and shorten time translating data into biological information. Data from a previously acquired Zucker rat study (on the TripleTOF system) was processed as proof of concept, through this streamlined workflow using XCMSplus software.
Sample Preparation: Plasma aliquots from Zucker rats (11 lean, 9 fatty, and 10 obese phenotypes) were obtained from Bioreclamation (Westbury, NY, USA). Animals were fed ad libitum chow which was 18% protein, 6% fat. The plasma was obtained from a terminal bleed at 7-9 weeks old and was preserved only with EDTA and maintained at -80 °C. Protein was removed from the samples by the addition of 8 volumes of cold methanol then centrifuged at 10,000 rpm in a bench-top centrifuge and the supernatant retained.
LC-MS/MS analysis: Using an Agilent 1290 HPLC system and a high strength silica column (Acquity HSS T3 1.8µm, 2.1 x 100mm, Waters, Milford, US) at 60 °C, polar metabolites from a 5 µL injection of serum were separated at a flow rate of 600 μL/min. Full scan TOF MS and MS/MS data was acquired on a TripleTOF 5600+ System in data dependent mode (SCIEX). A pooled sample, used as a QC, was also acquired every 5 samples and used to monitor data reproducibility.
Data Processing: Data was processed in XCMSplus Software in both pair-wise and multi-group job mode. The data files are automatically loaded and read by XCMSplus Software and converted for peak finding and RT correction/alignment. A pairwise job allows the comparison of two groups whereas multigroup job allows 3 or more groups of samples to be analyzed. Any MS data was compared versus the composite database for metabolite identification, MS/MS data were used for confirmation online available METLIN database.2
The TripleTOF System is a highly flexible and powerful MS system that can be used for both untargeted and targeted metabolomics as outlined in Figure 1. The workflow for untargeted metabolite screening using XCMSplus Software will be the focus of this work.
XCMSplus Software directly processes SCIEX LC-MS and LC-MS/MS *.wiff data files for feature detection and comparison. With the ability to process multiple jobs all at once, the software provides a single interactive workspace for monitoring job status and for reviewing results. Data can be processed from multiple vendors including Waters, Agilent, Thermo, LECO and Bruker.
Data can be processed in a number of ways which are referred to as “jobs”. A pairwise job allows the comparison of two groups and such paired and un-paired test analysis can be conducted. A multigroup job allows 3 or more groups of samples to be analyzed. As the number of groups is different so is the availability of statistical tests, here a user cannot conduct a one-way test. Both approaches allow principal component analysis (PCA) which is a multivariate analysis technique used to visualize data from multiple groups of samples and multiple variables (or features) in n-dimensional space. A t-test compares one variable at a time in two groups, i.e. tyrosine in the control sample versus tyrosine in the diseased sample.
Forty IDA data files from a TripleTOF 5600 system were loaded into the XCMSplus Software as well as uploaded to XCMS Online for comparison. The data files took around 24 hours to upload and process using the online version – this combines data upload which for these amount of data (~3.23GB) took 4-5 hours, then processing this data took a further 16-18 hours. The same 40 IDA data files were loaded and processed in XCMSplus Software and data loading took around one minute and processing took around 50-60 minutes. Therefore a comparison between the two approaches is as follows:
For cases where results are reviewed and data needs to be re-processed then using the Online approach means allowing another 24 hours. With XCMSplus, this time is far reduced where a user can review reprocessed results just after an hour.
XCMSplus Software outputs a results panel with an interactive workspace (Figure 2) where each extracted ion chromatogram can be reviewed for each significant feature. There is also an array of multivariate reports, from principal component analysis to an interactive heat map with hierarchal cluster analysis. The principal component analysis allows users to define which scaling method is best for the data and see both scores and loadings plots with metabolite annotations. In the interactive heat map, both extracted ion chromatograms (XICs) and spectral plots are given for each feature. This allows for fast, efficient interaction with the data and understanding of the experiment.
Interrogating the LC-MS data from the Zucker rat study, the different phenotypes were observed to cluster together in the PCA Scores plot (Figure 3). The different phenotypes (fatty, lean and obese) are colored to show the sample groups.
From the scores plot, the user can “view results table”. The table can be sorted based on p-value significance or q-value significance (Figure 4, top). The q-value is the false discovery rate adjusted p-value. The feature table identifies the retention time at which a particular feature was found as well as the intensity. Other observations (as columns) of interest can be added by the user and used for sorting (such as m/z value or retention time (RT), Corr Var = correlation variation, Max Int = maximum intensity, Feature Gp = feature group). Any isotopes and adducts are also identified as well as feature grouping which groups together any related ions.
From the feature table, a feature can be highlighted and the XIC can be visualized as an overlay from all the samples processed. One can then review the raw data and see if this is an actual feature or just noise or an irrelevant peak eluting in the solvent front. Retention time information as well as the accurate mass is displayed in this XIC plot (Figure 4, middle). The mass spectrum for that XIC can be viewed as well as a simple box and whisker plot highlighting the differences between the groups. Finally, each feature is matched for identification to the composite database. If MS/MS spectra are available, then the column labelled “MS/MS” is populated with “y” meaning MS/MS is available. The database identifications are listed in a table (Figure 4) and ranked by m/z error (ppm) and then alpha/numerically. Then using METLIN website (https://metlin.scripps.edu) any MS/MS confirmation could be made.
The most compelling visualization tool in XCMSplus software is the interactive cloud plot (Figure 5). Key visualization features include:
The cloud plot is completely interactive and can be filtered on any function listed above. So more stringency can be used for the p-value and also the mean fold change allowing for only the more highly significant features to be displayed.
The end results from XCMSplus mean a confirmed list of significant metabolites. From the Zucker rat study it was observed that there were many lipids changes amongst the three phenotypes studied (lean, fatty and obese) as well as other small metabolites including bile acids.
XCMSplus software can accelerate project completion time from weeks to days with local processing and batch analysis. With unlimited data storage capacity and the security of a local desktop package, XCMSplus offers the complete solution for untargeted metabolomics research. Beyond the robust data processing and visualization features of XCMSplus, virtually the entire metabolomics community can privately (and publicly) share their data and results within the XCMSplus software.