michael wilson (BI) wrote:
I really like the Jalview editor,
but I often have multisequence fasta files which have protein id from ncbi.
.. this is a real problem, which I hope to address in the next major release. Jalview is rather euro-centric at the moment, but there are plenty of services that we can now tap to make the jump between Entrez and Uniprot IDs (and hopefully more soon).
This doesn't allow me to use the links to info about the protein that are available with uniport ids. Is there some way I can make the links from gi| to uniport in Jalview, without doing a manual lookup up and translate??
There isn't a direct way of doing the translation at the moment - and I can't point you to a service on the web that will do ID transliteration for a sequence alignment (i.e. paste in a FASTA file with one set of IDs, and get back another FASTA file with the IDs tranliterated).
However, here is a quick fix for gi numbers using the user definable regex URL facility :
1. open the preferences dialog and select the connections tab.
2. click the New buttion to create a new URL link:
3. Enter NCBI as the link name
4. Enter 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=$SEQUENCE_ID=/(?:gi\|(\d+))/=$' (excluding the quotes) as the link URL.
5. select ok, and select OK to save the preferences.
6. Try opening the following sequence in Jalview and right clicking to get the links
>gi>12805189|gb|AAH02053.1| Capping protein (actin filament) muscle Z-line, beta [Mus musculus]
Hopefully you should see an NCBI link, and selecting it takes you to the record at NCBI.
ps. the $SEQUENCE_ID=//=$ format is documented in the help.
pps. thanks to Bernd Brandt of IBIVU for constructing the NCBI regex!
J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.