is have been creating Jalview feature files to annotate structural
features of pdb structures. So that pdb residue number and sequence
position match up I have been inserting 'U' residues in the sequence
where residues are missing in the structure. I realise this is somewhat
unusual but it clearly shows (usually) disordered residues in when
viewing an alignment. It also has the advantage that alignment programs
'know' there is a residue there so that the correct spacing is
preserved. All this works nicely until I try and "Fetch DB References".
When I do this, my sequence with 'U's in it doesn't match the DB
sequence and no match is found. Can you help me find a way round this
We can certainly try. Unfortunately, Jalview behaves in a similar fashion to the alignment programs; that is, It knows that U is a residue - and treats it accordingly. It would be possible to get the Jalview sequence matcher to ignore selenocysteines when comparing the database sequence against the sequence in the alignment, but it's probably not something that most people would want to have enabled by default.
The obvious route is to change the way that you work - but I'm not sure how invested you are in your current approach - and so how much effort it will take to fix or revise your existing annotation files and alignments. However, normally, the way to achieve what you ask would be to simply use the real sequences in your alignments, and let Jalview deal with residues missing coordinates automatically (by aligning residues with coordinate data with the sequence). Jalview annotates any residue with coordinates using the 'PDBRESNUM' feature, so you can see which ones are not found in the pdb structure by virtue of the fact that no PDBRESNUM annotation is present at that position.
If the above is not sufficient, then do you want to preserve the utility of your original approach by having Jalview automatically highlight regions that have no structure coordinates ?
It would be possible to have Jalview automatically add a complement to the PDBRESNUM annotated sequence positions. Although, personally, I would also want to have some geometry checks to make sure that there really is a chain break in the model before I'd indicate that un-mapped residues correspond to disordered regions.
I also have another question about the coordinate space of structural annotation that you are generating. If you are already working in the 'expressed sequence' coordinate system rather than the PDB numbering, then you shouldn't need to change your exisiting feature files if you simply use the real sequence rather than the ones with chain breaks replaced by U's. Is that the case, or do you also need to 'lift over' your structure annotation onto the 'expressed sequence' coordinate space ?
On 24/11/2010 10:15, William Ross Pitt wrote:
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.