Currently if you fetch from Uniprot, EMBL a compound sequence name is made e.g.
UNIPROT>accession>accession>accession>…|name|name|…
PDB>pdbId>name>chain>id (?)
EMBL>accession
but if fetching from Pfam, Rfam or Ensembl the sequence name is just the accession id.
Is there a rationale to this?
I would like to know since SequenceIdMatcher depends on it.
It has to know to look for a sequence called “UNIPROT|P1560” to resolve a UNIPROT database reference, but not to include the source database if resolving an ENSEMBL reference, which seems ad hoc.
Or does this problem go away when ‘primary db reference’ (JAL-2106) is, well, resolved? Which will I guess remove the overloading of the sequence name with this information.
Any thoughts?
thanks
The University of Dundee is a registered Scottish Charity, No: SC015096
···
Mungo Carstairs
Jalview Computational Scientist
The Barton Group
Division of Computational Biology
School of Life Sciences
University of Dundee, Dundee, Scotland, UK.
www.jalview.org
www.compbio.dundee.ac.uk