- Home
- Life Science Research
- Metabolomics
- Advanced data processing software enables autonomous metabolite identification on the X500R QTOF System with XCMSplus software
Using XCMSplus Software
Baljit K. Ubhi1 and Cyrus Papan3
1SCIEX, CA, USA and SCIEX, Germany
Metabolomics is the scientific study of the chemical fingerprint which is left behind by cellular processes1. Untargeted metabolomics studies are valuable to biomarker researchers for studying disease effects over time and to identify novel biomarkers to track disease progression. Mass Spectrometry tends to be the analytical tool of choice given its sensitivity to other techniques such as 1H NMR Spectroscopy. The new X500R QTOF System was developed for routine, robust workflows and requires minimal MS expertise. Data from a previously acquired from a prostate cancer study was processed as proof of concept, through the XCMSplus software pipeline to highlight advanced data processing features required of untargeted metabolomics studies. XCMS software is the “World’s most cited metabolomics software” as it has over 1000+ citations in literature and very trusted in the metabolomics community The software allows you to load, process and analyze their data in one interactive workspace. Peaks are annotated and matched against the METLIN database for metabolite identification.
Sample Preparation: Urine samples were obtained with disease classifications that were previously determined using accepted clinical techniques. The specific gravity of the samples was measured by testing 15 µL of urine samples with a refractometer prism. A 50 µL volume of the thawed urine samples was then transferred to a clean, labeled microcentrfuge-filter tube. An isotopically labeled internal standard mixture (20 µL) was added. The urine sample was then diluted with 400 µL 98:2 acetonitrile/water with 0.1% sodium azide, then vortexed. The sample was then centrifuged, and the supernatant was isolated. This was then dried down and reconstituted in 50 µL of 0.1% formic acid in water. The samples were transferred to glass vial and loaded into the autosampler.
Chromatography: The reverse phase HPLC separation was performed using a Shimadzu LC System, operating at a flow rate of 350 µL/min. The column used was an Ace Excel C18-PFP column (100 x 1mm, 2 µm) from ACE, maintained at 30 ºC. A standard reverse phase gradient was used employing mobile phase A as 0.1% formic acid in water and mobile phase B as acetonitrile. The injection volume was 3 µL in positive ion mode and 5 µL in negative ion mode.
Mass Spectrometry: The data was collected using information dependent acquisition (IDA) on the X500R QTOF System. Using optimized source conditions, the MS mass range analyzed was 50-600 m/z and the MS/MS was acquired with a mass range of 40-600 m/z with a 25 msec accumulation time.
The collision energy was set to 35 V with a 15 V collision energy spread to ensure high quality MS/MS on most metabolites.
Data Processing: The data was processed in XCMSplus software, a desktop version of XCMS Online. The wiff2 data were converted using the msconvert tool of Proteowizard (3.0.9992) to mzXML format and pushed through XCMSplus to pick peaks, align features, normalize data and subsequently aid metabolite identification of the significantly extracted features.
From the scores plot (Figure 3) one can see a difference along the y-axis between the healthy and diseased classified samples. From here the user can “view results table” and view a table of extracted features. The table can be sorted based on p-value significance or q-value significance (Figure 4). The q-value is the false discovery rate adjusted p-value (not featured in this figure). The feature table identifies the retention time at which a particular feature was found as well as the intensity. The fold changes between any sample groups of that particular feature are also listed in the column labeled “fold” (Figure 4). Other observations (as columns) of interest can be added by the user and used for sorting (such as m/z value or retention time (RT), Corr Var = correlation variation, Max Int = maximum intensity, Feature Gp = feature group). Any isotopes and adducts are also identified as well as feature grouping which groups together any related ions.
From the feature table, a feature can be highlighted and the XIC can be visualized as an overlay from all the samples processed (Figure 4 top right). One can then review the raw data and see if this is an actual feature or just noise or an irrelevant peak eluting in the solvent front. Retention time information as well as the accurate mass is displayed in this XIC plot. The mass spectrum for that XIC can be viewed as well as a simple box and whisker plot highlighting the differences in intensities between the groups (including the average and upper and lower limits of the intensity per group) (Figure 4, middle right). Finally each feature is matched for identification to the METLIN database. If MS/MS spectra are available then the column labelled “MS/MS” is populated with “y” meaning MS/MS is available (not shown here). The database identifications are listed in a table (Figure 4 bottom, right)) and ranked by m/z error (ppm) and then alpha/numerically. Then using composite ID column one can click the METLIN ID and be referred to the METLIN website (https://metlin.scripps.edu) where any MS/MS confirmation could be compared to the experimental spectra (Figure 5). A range of MSMS spectra are available at varying collision energies.
The most compelling visualization tool in XCMSplus software is the interactive cloud plot (Figure 6). Key visualization features include:
The cloud plot is completely interactive and can be filtered on any function listed above. So you can be more stringent for the p-value and also the mean fold change allowing for only the more highly significant features to be displayed.
The X500R QTOF System is a robust easy to operate, benchtop system that requires minimal MS expertise to perform untargeted metabolomics analyses. As metabolomics continues to expand in disease research, robust easy-to-use solutions that provide quality answers will be increasingly important.
In this study, samples from a pilot prostate cancer study was analyzed and a clear difference between healthy and disease urine samples were detected using this untargeted metabolomics approach, confirming the original disease classifications.
XCMSPlus Software was used to extract the features from the dataset and simultaneously identify them against a large well-known database. All of the data review and any statistical data analysis could be completed within a single software package.
The software allows advanced data processing and data analysis which are often required by researchers for untargeted metabolomics studies.
This pilot study provided confidence in the approach and the next phase of the study analyzing a much larger set of samples is underway.