Using DIA-NN software and the ZenoTOF 7600 system
Christie Hunter and Alexandra Antonoplis
SCIEX, USA
As data independent acquisition (DIA) continues to grow as a powerful workflow for quantitative proteomics, new workflows and software tools have emerged. Here, the Zeno SWATH DIA data was processed using DIA-NN software and an in silico-generated library to identify of large numbers of proteins from a complex proteomic sample. ~2-fold more proteins were identified from a cell lysate using a 45-min LC gradient as compared to the more traditional data dependent acquisition (DDA) approach coupled with database searching. Other workflow comparisons were performed to benchmark the workflow and are discussed.
While obtaining accurate quantification information for proteins across sample sets is key to understanding biology, the proteins present in the sample must first be identified. Thus, the ability to identify proteins present in a biological sample remains an important workflow. There are 2 main workflows in which researchers can identify very large numbers of proteins. The traditional workflow, termed shotgun proteomics, is a data-dependent workflow (DDA) in which MS/MS data are acquired for detected precursors and then all MS/MS spectra are subjected to a database search to identify the peptides and proteins in the sample.1
Over the last 10 years, data-independent acquisition (DIA or SWATH DIA) has emerged as a powerful workflow for quantitative proteomics. In this workflow, MS/MS spectra are acquired for all detectable peptides and peptide peak areas are extracted using experiment-specific spectral libraries. More recently, multiple powerful algorithms have emerged that have enabled proteins to be identified from DIA data without having to generate an experimental spectral library in advance. These advances have increased the ease of using DIA for protein identification workflows.2 Â
Here, the traditional DDA workflow (using Zeno DDA) for protein identification was compared to the Zeno SWATH DIA workflow, in which proteins are identified using a library generated in silico from a FASTA file. The numbers of proteins and peptides identified with these approaches were compared for 45-min and 10-min microflow LC gradients. The results generated by the library-free Zeno SWATH DIA workflow were compared with other library-based approaches.
Sample preparation: A digest of human K562 cell lysate from the SWATH performance kit (SCIEX) was used. Sample loadings ranged from 12.5–400 ng on column.
Chromatography: Separations were performed using a Waters ACQUITY UPLC M-class system plumbed for microflow chromatography (5 µL/min), using a Phenomenex micro trap analytical column. All data were acquired using either a 45-min or a 10-min gradient. See previous technical note for more details.5
Mass spectrometry: Data were acquired on 2 ZenoTOF 7600 systems with either Zeno DDA or variable window Zeno SWATH DIA. Data were acquired in triplicate with the Zeno trap enabled in MS/MS mode for all acquisitions. See previous technical note for more details.5
Data processing: Zeno DDA data were processed using the ProteinPilot App in OneOmics suite using the canonical+isoforms version of a UniProt FASTA file based on human samples. Protein and peptide FDR numbers (<1% global FDR) were obtained from the Analytics dashboard. Zeno SWATH DIA data were processed using DIA-NN software and a library-free process.4 First, a spectral library was generated from the canonical+isoforms version of a UniProt_Human FASTA file, then this in silico library was used to process the DIA data to determine the number of proteins identified. A <1% precursor FDR filter was applied. Processing settings were previously described in detail in a community post.6 Protein and peptide precursor areas from the *.pr_matrix.tsv and *.pg_matrix.tsv files were copied into Microsoft Excel and the numbers of proteins and peptides quantified at 20% CV were computed.
Zeno SWATH DIA data were later processed in DIA-NN software using the Pan Human Library5 and a large library generated on the ZenoTOF 7600 system8 to compare the performance of the library-free method. This large library was generated from two fractionation experiments of two human cell lines (Hela, K562) which were each processed into a single search result in the ProteinPilot app in OneOmics suite. The search results for each cell line were then merged, and retention time aligned using the Extractor application to create a final ion library.
Â
When the Zeno trap is activated on the ZenoTOF 7600 system, MS/MS sensitivity is enhanced as much as 5- to 6-fold for peptides.4,6 This MS/MS acquisition mode can be applied to both DDA and DIA workflows. Here, these workflows were compared on their ability to identify large numbers of proteins and peptides from a complex proteome digest that was separated with microflow chromatography. The gains in proteins identified and quantified using the Zeno SWATH DIA workflow were previously characterized.6 Here, the data collected using the Zeno DDA workflow are compared to those acquired using the Zeno SWATH DIA workflow.
The Zeno DDA data were processed using the ProteinPilot app in OneOmics suite using a Thorough search to identify the maximum number of peptides and proteins (<1% global FDR) in the dataset. The Zeno SWATH DIA data were processed using a library generated in silico from a human FASTA file using DIA-NN software. An in silico library can be generated in the span of a few hours, depending on the size of the FASTA file selected. It can then be saved to later process the Zeno SWATH DIA data in DIA-NN software, using the standard library extraction processing workflow.
The Zeno SWATH DIA data were used to identify more proteins and peptides than the Zeno DDA data, as shown in Figures 1 and 2, respectively. Gains in proteins identified were ~2-fold greater across the sample loading range tested and this result was consistent between the 2 ZenoTOF 7600 systems tested. The lists of identified proteins generated by the Zeno DDA and Zeno SWATH DIA workflows were compared and nearly all proteins identified from the Zeno DDA data were also identified using the Zeno SWATH DIA approach (Figure 3). Â
Next, the results generated using the emerging library-free approach were compared with those from a traditional SWATH DIA data processing strategy that uses experimentally generated spectral libraries to extract peptide and protein data from a SWATH DIA dataset. Two experimentally derived libraries were selected for comparison, including a large library previously generated using the ZenoTOF 7600 system and fast microflow chromatography7 and the publicly available Pan Human library.3   Zeno SWATH DIA data that were acquired in triplicate were processed using DIA-NN software. The number of proteins identified (<1%FDR) and quantified (<20% CV) using the library-free approach and experimentally derived libraries were compared (Figure 4).
The protein identification numbers from the 3 different data processing approaches were similar, highlighting the power of the library-free approach. The proteins identified using the in silico generated library and using the ZenoTOF 7600 system library were compared to determine whether similar proteins were found by each approach. The Venn diagram (Figure 5) shows that very similar lists of proteins were generated, adding further confidence to the approach.
The Zeno SWATH DIA approach offers the additional advantage that the rich MS/MS dataset it generates can be used to derive more robust, specific, quantitative data, rather than using the MS1 data for quantification from Zeno DDA. Approximately 90% of the proteins identified were also quantified with <20% CVÂ across triplicate analysis (Figure 4) for each library strategy when using the Zeno SWATH DIA workflow.
The study was replicated using a rapid 10-min LC gradient to confirm whether the Zeno SWATH DIA workflow would yield similar gains in identifications with faster chromatography. With the 45-min gradient, ~2-fold gains were observed across the load curve with the Zeno SWATH DIA approach, relative to the Zeno DDA approach (Figure 6). With the faster gradient, a greater increase in gains was observed using Zeno SWATH DIA, as 2.5- to 3-fold more proteins were identified from the Zeno SWATH DIA data under these conditions. This observation is expected, as DDA approaches are stochastic and therefore the acquisition strategy is not suitable to analyze large numbers of rapidly eluting precursors. The load curves for the proteins and peptides identified from the 10-min gradient data are shown in Figure 7, highlighting the improvements observed with the Zeno SWATH DIA approach over Zeno DDA.
The Zeno SWATH DIA workflow described here was superior to the traditional shotgun proteomics approach using Zeno DDA for the identification of large numbers of proteins from a complex proteomic sample. Using an in silico-generated library for data processing with DIA-NN software, ~2-fold more proteins were identified from a cell lysate using a 45-min LC gradient and 2.5- to 3-fold more were identified using a 10-min gradient. Comparing the proteins identified using the in silico-generated library to 2 commonly used experimentally generated libraries demonstrated high agreement, lending confidence to the library-free workflow as an easy and comprehensive alternative for large-scale protein identification experiments.
Â