Why should I use canonical sequence reviewed databases in my ProteinPilot software search?


Date: 07/12/2023
Categories:

0 Votes
   Print    Rate Article:

For research use only. Not for use in diagnostic procedures.


Answer

A canonical database in FASTA file format is prefered to use with the ProteinPilot software. Understanding how a canonical data base is curated explains why search results with these type of data bases result in less false positive identifications.

What is the canonical sequence?

Each UniProtKB/Swiss-Prot entry contains all curated protein products encoded by a given gene in a given species or strain. For each UniProtKB/Swiss-Prot entry, we choose a canonical (or representative) sequence for display that should conform to at least one of the following criteria: described in this link: https://www.uniprot.org/help/canonical_and_isoforms

Are all isoforms described in one UniProtKB/Swiss-Prot entry?

Whenever possible, all the protein products encoded by one gene in a given species are described in a single UniProtKB/Swiss-Prot entry, including isoforms generated by alternative splicing, alternative promoter usage, and alternative translation initiation (*). However, some alternative splicing isoforms derived from the same gene share only a few exons, if any at all, the same for some 'trans-splicing' events. In these cases, the divergence is obviously too important to merge all protein sequences into a single entry and the isoforms have to be described in separate 'external' entries.

Example: isoforms derived from the lola gene (Drosophila melanogaster)

(*) Important remark: Due to the increase of sequence data coming from large-scale sequencing projects, UniProtKB/TrEMBL may contain additional predicted sequences encoded by genes which are described in a UniProtKB/Swiss-Prot entry.