gi to uniprot??

michael_wilson_bi · 29 July 2009 11:35

HI All!!

I really like the Jalview editor, but I often have multisequence fasta files which have protein id from ncbi. This doesn't allow me to use the links to info about the protein that are available with uniport ids. Is there some way I can make the links from gi| to uniport in Jalview, without doing a manual lookup up and translate??

Many thank yous to anyone that can help me in this regard. If this was automatically done by Jalview, it would, indeed, be perfect.

CHEERS!

Michael

jimp · 4 August 2009 10:02

Hello Michael.

michael wilson (BI) wrote:

I really like the Jalview editor,

great!

but I often have multisequence fasta files which have protein id from ncbi.

.. this is a real problem, which I hope to address in the next major release. Jalview is rather euro-centric at the moment, but there are plenty of services that we can now tap to make the jump between Entrez and Uniprot IDs (and hopefully more soon).

This doesn't allow me to use the links to info about the protein that are available with uniport ids. Is there some way I can make the links from gi| to uniport in Jalview, without doing a manual lookup up and translate??

There isn't a direct way of doing the translation at the moment - and I can't point you to a service on the web that will do ID transliteration for a sequence alignment (i.e. paste in a FASTA file with one set of IDs, and get back another FASTA file with the IDs tranliterated).

However, here is a quick fix for gi numbers using the user definable regex URL facility :
1. open the preferences dialog and select the connections tab.
2. click the New buttion to create a new URL link:
3. Enter NCBI as the link name
4. Enter 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=$SEQUENCE_ID=/(?:gi\\|$\\d\+$\)/=$' (excluding the quotes) as the link URL.
5. select ok, and select OK to save the preferences.
6. Try opening the following sequence in Jalview and right clicking to get the links

>gi>12805189|gb|AAH02053.1| Capping protein (actin filament) muscle Z-line, beta [Mus musculus]
MSDQQLDCALDLMRRLPPQQIEKNLSDLIDLVPSLCEDLLSSVDQPLKIARDKVVGKDYLLCDYNRDGDSYR
SPWSNKYDPPLEDGAMPSARLRKLEVEANNAFDQYRDLYFEGGVSSVYLWDLDHGFAGVILIKKAGDGSKKI
KGCWDSIHVVEVQEKSSGRTAHYKLTSTVMLWLQTNKSGSGTMNLGGSLTRQMEKDETVSDCSPHIANIGRL
VEDMENKIRSTLNEIYFGKTKDIVNGLRSVQTFADKSKQEALKNDLVEALKRKQQC

Hopefully you should see an NCBI link, and selecting it takes you to the record at NCBI.

have fun!
Jim.

ps. the $SEQUENCE_ID=//=$ format is documented in the help.
pps. thanks to Bernd Brandt of IBIVU for constructing the NCBI regex!

···

--
-------------------------------------------------------------------
J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.