There are two key protein scores that are computed in ProteinPilot™ Software, the Total ProtScore and the Unused ProtScore. The Total ProtScore is simply a sum of all the peptide evidence related to a given protein, whereas the Unused ProtScore reflects the amount of total, unique peptide evidence related to a given protein. Because proteins can often share homology (similarity at the sequence level), there are often peptides identified in a database search that point to more than one protein. The Pro Group™ Algorithm works to try and resolve the complexity of reporting identified proteins, with a goal of reporting just the proteins that are truly present and not just an artifact of the complexity of the protein inference challenge. After peptide identification is complete and Pro Group Algorithm has been run to assemble the list of proteins, the Total ProtScore of every protein is computed and then ranked. Starting from the top of the list, all the peptide evidence is assigned to the first protein, making the Unused ProtScore the same as the Total ProtScore. Then each protein down the list is analyzed and when shared peptide evidence is found for a protein that has already been used in a protein higher up the list, the score for that peptide is removed from the lower protein and the Unused ProtScore will be recalculated. Therefore the Unused ProtScore will be less than the Total ProtScore and will reflect only the unique evidence that supports the presence of that protein. On every iteration of this process of recalculating, the Unused ProtScore, the protein lest is re-sorted according to Unused ProtScore values.
Why is this important? Imagine a scenario where you have a set of protein isoforms that share some amount of sequence homology. Nine peptide identifications have been found during the search that associate with these proteins shown in the Figure. There are 8 peptides that point to protein isoform A and because this has the most evidence, it would be ranked higher up the protein list. Protein B has 4 peptides associated with it, however all of the peptides would have already been claimed by Protein A and therefore Protein B would have no unique peptides and therefore a Unused ProtScore of 0. Protein C has 5 peptides that associate with its protein sequence however again 4 peptides have been previously claimed by protein A. The fifth peptide is unique to Protein C and therefore our confidence on whether or not Protein C has been detected will depending on the confidence of that unique peptide hit and the Unused ProtScore will be based only on that peptide. This will be ranked much lower down the protein list.
For more detailed information, please refer to the Help documentation that installs with ProteinPilot Software.
Another key element to this process is the fact that the peptide confidences are recalculated during the grouping process. As correct peptides are assigned to proteins, the distribution of correct peptides is depleted relative to the distribution of incorrect peptides which requires an recalculation of confidence to maintain accuracy.