MarkerView™ Software 1.2.1 for metabolomic and biomarker profiling analysis

Abstract

With the large amount of quantitative data that can be generated by mass spectrometers today, it is essential to have data processing tools that can process the data and mine the results efficiently and provide easy to understand visuals. MarkerView Software is powerful, easy to use software designed for the statistical analysis of data acquired on all SCIEX mass spectrometers, from Omics experiments to food authenticity, water testing experiments and many others. With both multivariate and univariate statistical capabilities including principal component analysis (PCA), principal component variable grouping (PCVG), and t-tests, MarkerView Software enables features of interest to be easily discovered. Normalization algorithms, peak detection algorithms and extensive reporting capabilities round out the functionality to make this software extremely versatile for downstream analysis.


Introduction

MarkerView Software is a novel program designed for metabolomics applications and biomarker profiling workflows1. Using sophisticated statistical analyses and graphical tools, MarkerView software allows rapid review of data acquired on all SCIEX mass spectrometers, for example to determine up- and down-regulation of components in complex samples. With statistics capabilities including principal component analysis, principal component variable grouping and t-tests, analysis options include a supervised approach (where the sample groupings are known ahead of time), and unsupervised (where knowledge of the inherent sample groupings prior to analysis is unavailable), or a combination of the two. After data analysis, MarkerView Software provides powerful data visualization tools and report generation capabilities so that features of interest can be easily discovered and work can be readily tracked. 

Figure 1. MarkerView Software. The MarkverView Software uses PCA to reveal similarities and differences in a set of biological samples.4 Here, the Scores plot (top left) shows that replicate injections of the same sample are clustered together while different samples are well separated. The Loadings plot (top right) indicates which variables are responsible for the observed separation and PCVG has been used to determine correlated variables which are given the same symbol. The bottom two plots show the profiles (the variable’s response for all samples) for all variables in group 3 (lower left) and group 6 (lower right). Clearly the groups contain variables that separate oranges and grapefruit from the rest of the samples. Each group contains several hundred individual variables.

Key features of MarkerView™ Software

  • Perform chromatographic and spectral peak picking to find true peaks in complex samples
  • Automatically align mass and retention time to compensate for minor variations, ensuring that identical compounds in different samples are accurately compared
  • Process data using supervised and/or unsupervised workflows with principal component analysis as well at t-tests to accelerate the discovery of new biomarkers
  • Principal Component Variable Grouping (PCVG) to assist data interpretation by locating and highlighting correlated variables
  •  Link back to raw mass spectra and extracted ion chromatograms to support identification of found putative biomarkers
  • Automatically create reports on potential biomarkers and export data to third-party statistical packages for additional data mining
  • Import multiple types of data acquired on any SCIEX mass spectrometer, including both ESI- and MALDI-based platforms

Processing of raw data

MarkerView Software uses sophisticated processing algorithms that accurately find chromatographic and spectral peaksdirect from rawdata in complex data sets. Data alignment by the software compensates for minor variations in both mass and retention time values, ensuring that identical compounds in different samples are accurately compared to one another (Figure 2). Subsequent normalizationof the data accounts for sample to sample variation, such as sample size or injection volume, and provides more precise results. MarkerView Software can process data acquired by LC-MS, MALDI-MS, flow injection, and/or direct infusion from all SCIEX mass spectrometers.After trends are found,it canlink back to raw mass spectra and extracted ion chromatograms to support identification of features.

Direct Import of Processed Data

Processed data can also be exported and imported directly into MarkerView Software for statistical analysiswhich significantly streamlines the processing pipeline. Here the primary data processing software enables the identification and quantitation of the known analytes. Date analysis can be then undertaken in MarkerView Software. Software packages that can export data to MarkerView Software:

  • SWATH® Acquisition MicroApp in PeakView® Software
  • LIpidView™ Software
  • MasterView™ Software
  • SCIEX OS

Figure 2. Feature alignment. The Alignment & Filtering dialog box compensates for minor variations in mass and retention time, automatically ensuring that identical compounds in different samples are accurately compared. Retention time correction can be based on standards if available. 

Principal component analysis

MarkerView Software allows visualization of many types of data with principal component analysis (PCA). PCA is an unsupervised multivariate statistical analysis approach that allows trends to be recognizedacross groups of samples within a dataset. This is graphically representedin a Scores plot (Figure 3, left). The scores plot highlights the largest variation in the dataset in principal component 1 (PC1, x-axis) and the second largest variation in the datasetasPC2 (y-axis).Reviewing the Loadings plot provides insight into variables that lead to any sample trend in the scores plot(Figure 3, right). The Loadings plot would then illustrate which compounds are being up-or down-regulated. Discriminant analysis (PCA-DA) can also be employed, a supervised version of PCA wherein prior knowledge of sample groups is used to determine the variables that maximize the variation between groups and those which minimize the variation within a group.

Figure 3. Principal component analysis. Powerful statistical tools such as Principal Component Analysis (PCA) enable grouping of samples based on common features. The Scores plot (left) shows groups and differences among the samples, for example the quality of technical replicates can quickly be ascertained as well as the categorization of samples based on differentially observed components. The Loadings Plot (right) reflects the variables that are causing the separation and those with higher loadings values are generally more significant for the separation.

Additionally, discriminant Analysis (PCA-DA) can be employed, which is a supervised version of PCA wherein prior knowledge of sample groups is used to determine the variables that maximize the variation between groups and those which minimize the variation within a group.

Principal component variable grouping

Principal Component Variable Grouping (PCVG) is a tool that analyzes the PCA loadings values to find correlated variables, i.e. those that share a common expression pattern across the samples2. Essentially it uses the samples to group the variables, which simplifies interpretation since there are usually far fewer groups than variables and the behavior of any one represents the rest of the group.

MarkerView Software user interface allows all members of the group to be selected together for display and also allows them to be removed if the behavior is not relevant (contamination for example).  The top-right graph in Figure 1 shows an example of PCVG. As shown, the variables are color-coded with respect to the sets of variables that are responsible for separation of the samples into groups. At the bottom, Group 3 and Group 6 variables are displayed in Profile Plots facilitating the identification of those variables responsible for distinguishing the groups.

T-Test

The t-test is a supervised analysis technique and is useful when two or more predetermined classes of samples are present. MarkerView Software will perform a pair-wise comparison of all the classes or alternatively, compare one class to all others. The results of the t-test indicate how well each variable distinguishes the two groups. This is reported as a p-value, i.e. the probability that the observed difference occurs by chance.

The profiles for individual variables can be generated or the overall behavior can be summarized in displays such as the “volcano plot” (Figure 5). For any two groups, the volcano plot shows for each variable the log of the fold change (the ratio of the average response in each group) as a function of the p-value (the probability that the observed difference occurs by chance). Especially significant variables are those that have a low p-value (small probability of occurring by chance) and a large fold change. At the extremes of the horizontal axis are groups of variables that are completely absent in one sample group, that is, they have an infinite fold change. In the particular example shown in Figure 5, the data was first processed with PCA and PCVG so the different symbols indicate the correlated variables.

Figure 5. 2D T-Tests. The volcano plot of p-value vs. log (fold change) combines the fold change in the response of a variable between groups with the probability (p-value) that the variable distinguishes the groups to quickly highlight important variables.

Figure 6. Easy data visualization. Additional visualization graphics and tools aid in the detailed investigation of trends across samples.  By plotting the abundance of one or more variables across multiple samples, features that behave similarly can be quickly spotted as in this example, where selected variables distinguish strawberries and organic strawberries from other fruit samples.

Data normalization

For more precise quantification, sometimes data normalization is needed to account for systematic variance across a dataset. There are a number of different ways to perform normalization and the right choice depends on the experiment and data type. In MarkerView Software, there are numerous normalization strategies that can be applied.

  • Using Internal Standards or Selected Peaks –specific features are used to normalize across the dataset
  • Using Total Area Sums –sum of all peak areas used to compute the scaling factor
  • Using Most Likely Ratio (MLR)3–used when a large number of features are variable, ratio between features used to compute a ratio histogram which is aligned across samples
  • Using Manual Scale Factors –using scale factors computed by other means

Other data visualization plots

When analyzing complex data sets, tools that allow visualization and manipulation of the data to discover true patterns and correlations within the data are critical. In MarkerView Software, not only are the traditional Scores and Loadings plots generated from PCA valuable, but direct links to raw mass spectra and chromatographic data provides a simple way to go through multiple data files simultaneously to confirm findings (Figure 7).

Profile plots can be easily generated to show the intensity of selected compounds across multiple samples to assist in finding features of interest (Figure 6). This view can also highlight expression patterns resulting from systematic errors or contamination that can then be excluded.

Create reports

During data review, MarkerView Software allows you to create a list of compounds of interest. This list can be annotated and is used to automatically generate a report in Microsoft Word containing user specified information. The reporting tools can easily be customized in terms of both the Word template and layout. Information that can be reported includes the Scores and Loadings plots, Profile plots, Mass Spectra as well as Extracted Ion Chromatograms for all compounds of interest such as suspected biomarkers.

Figure 7. Automatic links to raw data. The data shown here were obtained by analyzing the urine from three rats before (open circles, top left) and after (closed circles) administration of a single dose of vinpocetin. The upper two panels show the Scores (left) and Loadings (right) plots. The variables within the blue rectangle were selected for display and generated the profile plots shown in the lower left panel which shows the response for the six samples. Four data points were selected and the raw spectra displayed (bottom right). It is clear that the most intense variable (m/z 323.2 at 16.3 min) is absent from the pre-dose samples (orange and green traces).

Summary

Easily go from raw data to results - MarkerView Software provides the essential tools to aid in statistical analysis and visualization of large numbers of samples and prevents data overload. Originally designed for metabolomics and biomarker profiling researchers, the powerful and easy-to-use features of MarkerView Software help to accelerate your discovery of differentially expressed compounds and new biomarkers.MarkerView Software also applies to the food or forensic drug chemist, helping to identify the minor component differences between samples toaid in determining the source of substances and/or authenticity.

References

  1. Sangster TP et. al. (2007) Investigation of analytical variation in metabonomic analysis using liquid chromatography/mass spectrometry.  Rapid Commun. Mass Spectrom. 21, 2965-2970.
  2. Ivosev G, et. al. (2008) Dimensionality Reduction and Visualization in Principal Component Analysis. Anal. Chem. 80, 4933-4944.
  3. Lambert et al. (2013)Mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition. Nature Methods, 10, 1239-1245.
  4. Enabling systems biology driven proteome wide quantitation of Mycobacterium tuberculosis. SCIEX technical note RUO-MKT-02-4292-A.