Folks,
I was wondering if there is going to be an option to Retrieve IDs using the UniParc_ID as there has been a lot of changes with UniProt moving sequences to UniParc if they are not part of a reference database. It would be helpful.
It seems that currently all UniProt proteins have a UniParc_Id but not the other way around.
Yours
Adrian Lapthorn
This sounds like something we should be be able to manage. Would you want to be able to specify uniparc specifically, or have jalview try it as a fallback when a uniprot ID does not resolve ?
Im also keen to speed up retrieval, since the uniprot api is much quicker for bulk downloads, we just aren’t using it directly except for free text search (which is painful bonuses when retrieving 200+ proteins!).
We’ll create an issue and post…
I honestly don’t know.
If you take the following sequences;
A0A3E0HA18_9GAMM
A0A2P6ATW0_9GAMM
A0A9E0VTQ7_9GAMM
A0A9E0KY96_9GAMM
A0A3D4UFU9_9GAMM
A0A4Q7YJP1_9GAMM
A0A507WJS5_9GAMM
A0A2T5J2M7_9GAMM
A0A962JF89_9GAMM
Jalview retreives 4 of these sequences and 5 are flagged as missing.
and search one by one in uniprot you get either an authentic unitprot entry
or you are forwarded to the UniParc page
So on the one hand it would be good to have Jalview to just get them all if this is going to be continued long term.
Depending on how the databases develop it might be useful to be able to Uniparc_IDs anyway, these are the ones for the sequences above.
UPI000E266B8A
UPI000CF5EBF6
UPI001B51B8DE
UPI001B6D84FF
UPI000EEE1ECE
UPI00102AECF8
UPI0011418F3D
UPI000D314BE4
UPI001D4D85E0
although the sequence headers might less informative when compared to the UniProt ones.
hope this helps
Adrian