Simplifying the use of ion libraries during data processing of SWATH acquisition proteomics data
Ion library merging with auto retention time calibration in OneOmics suite
Matthes Huebsch1, Adam Lau1, Kathleen Lewis2, Nick Morrice3, Sara Ahadi4, Christie Hunter2
1SCIEX, Canada, 2SCIEX, USA, 3SCIEX, UK, 4Stanford University, USA
As the use of data independent acquisition (DIA) grows in proteomics research, the need for improved data processing workflows increases. The most common DIA data processing workflow is to use spectral ion libraries to drive targeted extraction of peptide / fragment peak areas from the data, using the m/z and retention time information contained in the library. Increasing the size and quality of the ion library has been shown to increase the number of proteins reliably detected and quantified from a dataset1. Retention time (RT) correlation between ion library and the dataset is another key factor that determines quality of data extraction. Currently, the user either doses in a standard peptide mix or manually selects endogenous peptides to be used as retention time markers between the data and the ion library.
Simplifying how ion libraries are used during DIA / SWATH acquisition data extraction was explored in this work. Two algorithms were developed to simplify data processing within the OneOmics suite cloud processing pipeline, the Library Merging algorithm and the AutoRT Calibration algorithm.
Sample preparation: Digested cell lysates were obtained from various sources for testing, including human cell lysates, mouse cell lysates and human peripheral blood mononuclear cell (PBMC) lysates.
Chromatography: Separation of the digests were performed on a NanoLC 425 system (SCIEX) operating in trap elute mode mode at either nanoflow or microflow rates. A 0.3 x 150 cm column and a 0.35 x 10 mm trap were used for the microflow separations. A 75 µm x 150 mm column and a 0.35 x 0.5 mm trap were used for the nanoflow separations. All traps and columns used were packed with both ChromXP™ C18CL, 5 µm, 120 Å phase.
Mass spectrometry: MS analyses were performed using either data dependent acquisition (DDA) or SWATH acquisition on a TripleTOF 6600 system. Either the NanoSpray source or the Turbo V source with a 25 μm I.D. hybrid electrodes was used for ionization (SCIEX). Variable Q1 window SWATH acquisition methods (100 windows)2 were built using high sensitivity MS/MS mode with Analyst TF software 1.7.1.
Data Processing: DDA data was processed with ProteinPilot software and the group file was used as the spectral ion library. Library files and SWATH acquisition data were uploaded to the Cloud using CloudConnect and data were processed using OneOmics suite (Figure 1).
Two algorithms within the OneOmics suite cloud processing pipeline were tested; the Library Merging algorithm and the AutoRT Calibration algorithm. During library merging, ProteinPilot software group files are combined by selecting the largest file as seed library, then peptides from smaller libraries are merged in using a non-linear calibration strategy.3 New peptides are added to existing proteins and new proteins are added if not present in seed library. During SWATH acquisition processing, endogenous peptides are automatically selected across time range and the best peptides are chosen based on precursor intensity and ID confidence.4 Best scoring peak groups are used for RT calibration. All comparisons of number of proteins/peptides quantified were done using <1% FDR and <20% CV filters.
Validating the automatic retention time calibration algorithm
To evaluate the quality of the retention time alignment during SWATH acquisition data processing using AutoRT Calibration, results were compared on the same SWATH acquisition data set using both AutoRT and the typical RT calibration process using a set of spiked standards peptides (RT Cal). In 33/48 validation tests, AutoRT Calibration approach found similar or more proteins quantified than the standard approach and similar linear fit equations (data not shown). The algorithm was then applied to a series of additional biological examples. In the simplest case, a mouse 1D library was generated and used to process the SWATH acquisition replicates on a set of four mouse cell line samples; both IDA and SWATH acquisition data were collected with the same gradients. The AutoRT algorithm provided a very similar retention time calibration line to the standard synthetic peptide approach and resulted in very similar number of proteins and peptides quantified (Figure 2).
Next, the use of libraries generated at different flow rates was explored. A large human library collected using microflow on 15 high pH fractions was used to process SWATH acquisition replicates collected at both nano and microflow rates (Figure 3, SWATH Performance Kit Library5). The RT calibration lines for microflow (RT Cal - blue and AutoCal - green) were very similar and resulted in very similar numbers of proteins and peptides quantified. There was a slight difference in RT calibration when using the microflow library with nanoflow data, most likely because small changes in gradient profiles, seen by the light curvature in the red points at the higher organic end of gradient. This resulted in a small drop in the number of peptides quantified using the AutoRT approach. This highlights the importance of using similar linearity of gradient profiles across both library generation and SWATH acquisition experiments (gradient duration can change but profile should be similar).
Integrated Library Merging with AutoRT Calibration
To explore the effect of the merging of libraries, a series of libraries generated from 2D-LC fractionation runs performed on PBMC digests were combined and used to extract SWATH Acquisition data replicates on PBMCs digests. First, the use of a single SCX library was compared using AutoRT and RT calibration with manually selected endogenous peptides. Very similar linear fits were obtained with both techniques and a similar numbers of peptide and proteins were quantified (within 4%, Figure 4, top).
Next the libraries were sequentially combined and each was used to process the same PBMC SWATH acquisition data; results were compared to the results obtained using the first SCX 2D library and RT calibration with endogenous peptides. Combining the high pH library with the initial SCX library provided a gain of 20% peptides and 8% proteins quantified, as new peptides were added to the library from the orthogonal fractionation experiment. A second high pH library was also merged in, and provided additional but smaller gains. Note these fractionation libraries were generated using manual spin columns so less fractionation was achieved.
Next, a series of SWATH acquisition experiments on Mouse cell lines were studied. Here a library generated from a 1D data dependent acquisition was used to process the SWATH acquisition data and the AutoRT calibration algorithm was used, this was the benchmark (Figure 5). Next, a set of three 1D DDA runs were processed into a single library and used to extract the same SWATH acquisition data. Significant gains in peptides quantified was observed (105%) as the multiple DDA runs provided more peptide coverage for the higher abundant proteins, but minimal increase in proteins quantified. However, a larger 2D library generated from mouse 3T3 cells was then merged into the 1D library and again used to process the same SWATH Acquisition data. In this case the expanded library did provide additional proteins that were quantifiable out of the SWATH acquisition data (31% over 1D library). This resulted in ~2000 proteins quantified from the mouse cell lines with ~9900 peptides quantified.
Two algorithms for merging ion libraries and performing retention time calibration have been developed to further streamline the SWATH acquisition data processing pipeline in OneOmics suite.
- The performance of the AutoRT Calibration algorithm was demonstrated to be similar in performance to the RT Calibration process using manually selected peptides or dosed synthetic peptides, providing similar numbers of peptides quantified, while providing a significant improvement in ease of use
- Even translating microflow libraries for use with nanoflow SWATH acquisition data worked well (Figure 3), providing similar linear equations after retention time calibration
- Merging of relevant 1D and 2D LC-MS/MS ion libraries provides gains in proteins and peptides quantified, especially when additional libraries were generated using orthogonal 2D separations to better protein cover the similar sample type.
- Ability to visualize the retention time calibration lines highlights the importance of using similar, linear LC gradients are used when planning on sharing and merging libraries.