Date: | 07/12/2023 |
Categories: |
For research use only. Not for use in diagnostic procedures.
Answer
A canonical database in FASTA file format is prefered to use with the ProteinPilot software. Understanding how a canonical data base is curated explains why search results with these type of data bases result in less false positive identifications.
What is the canonical sequence?
Each UniProtKB/Swiss-Prot entry contains all curated protein products encoded by a given gene in a given species or strain. For each UniProtKB/Swiss-Prot entry, we choose a canonical (or representative) sequence for display that should conform to at least one of the following criteria: described in this link: https://www.uniprot.org/help/canonical_and_isoforms
Whenever possible, all the protein products encoded by one gene in a given species are described in a single UniProtKB/Swiss-Prot entry, including isoforms generated by alternative splicing, alternative promoter usage, and alternative translation initiation (*). However, some alternative splicing isoforms derived from the same gene share only a few exons, if any at all, the same for some 'trans-splicing' events. In these cases, the divergence is obviously too important to merge all protein sequences into a single entry and the isoforms have to be described in separate 'external' entries.
Example: isoforms derived from the lola gene (Drosophila melanogaster)
(*) Important remark: Due to the increase of sequence data coming from large-scale sequencing projects, UniProtKB/TrEMBL may contain additional predicted sequences encoded by genes which are described in a UniProtKB/Swiss-Prot entry.
Posted: now