colour by conservation - any better option?

TW001 · 6 February 2023 17:03

I have seen in large sequence alignments that the conservation score is 0 or “-” at all positions. I think this is because the conservation score measures the number of amino acid characteristics (e.g. “small” or “polar” or “positive”) which are true or false for ALL of the sequences in the alignment. The higher the number of sequences the more likely it is that at least one sequence will differ at a given position, meaning a particular characteristic cannot be added to the list of characteristics that are true at that position.

Nonetheless, there must be some way of scoring how conserved a position is (i.e. the chemical similarity of the amino acids at a given position in the alignment), no matter how large the alignment? I can’t find this in Jalview. There are variations similar to this - e.g. colour by conservation, BLOSUM62 score or percentage identity… but I would prefer to use a score of how similar the amino acids at a given position are, without the score focussing on the consensus amino acid or requiring all amino acids to match a category. Is that possible?

TW001 · 6 February 2023 17:37

I opened a sequence alignment in SnapGene Viewer, and the conservation score appears to vary between 1 and 100, suggesting that the scoring system is more like the desired one that I describe.

geoff.barton · 6 February 2023 17:41

Jim Procter will be able to answer this more completely, but he is away at the moment. In the meantime: The default conservation score in Jalview is Zvelebil’s method which is sensitive to gaps in a column and of course, if your alignment has a lot of columns that do not show much conservation then you will see zeros. In the implementation in AMAS (https://www.compbio.dundee.ac.uk/www-amas/) you can tune the number of gaps allowed per column but I don’t think we built that into Jalview.

This was mainly because, in Jalview what I usually do is first cluster a big alignment, then select subsets for analysis because often in a big alignment there are just a few sequences that cause the issues. Or, simply colour by groups having first clustered and cut the tree at a suitable point to filter out the outlier sequences from initial analysis.

There is a video that takes you through these steps if you are not sure how to do it. I hope this helps, Geoff.

P.S. You can also calculate different conservation methods in Jalview - see the “Web Service” menu. There are currently 17 different conservation methods so one of those might fit the behaviour you are looking for.

geoff.barton · 6 February 2023 17:45

I just selected a few different conservation methods to illustrate the point here…

TW001 · 6 February 2023 17:56

Could you tell me more about how to change conservation method? I couldn’t find anything about conservation method in the web service or other menus.

geoff.barton · 7 February 2023 09:28

To activate different conservation methods go to:

Web Service → Conservation->Change AA Con Settings

You can select which methods you want to see under the alignment. These update as you edit the alignment. Sometimes it is good to turn off the automatic update if you are editing a really big alignment. That option is also in the menu.

As I said before though, for large alignments I recommend that you first cluster the alignment and select the most informative regions/sequences to consider physico-chemical properties in each column or between sub groups in your sequence set. It was exactly for this kind of analysis that Jalview was developed to make it easy to work with big alignments. You might find this short video helpful in doing this kind of analysis:

geoff.barton · 7 February 2023 09:35

You may also find the videos on selecting groups of sequences helpful in your analysis. Please see the Jalview YouTube channel for the full set. The videos are short and cover specific functions.

https://www.youtube.com/@jalviewdundeeresourceonlin5424

There are also some tutorials on the Jalview website that might be helpful. See exercises 15:19 on this page which cover working with trees and subsettting alignments.

https://www.jalview.org/tutorial/exercises/

Geoff.

geoff.barton · 7 February 2023 09:37

Of course, if you have further questions, just get back to us here.

Geoff Barton

TW001 · 7 February 2023 10:29

I want to colour a .pdb file by conservation. Although I can show different conservation score method profiles below the alignment, I can’t see any option to colour the alignment according to different conservation score methods. Can this be done? Alternatively, can I export a list of conservation scores, so that I can manually put this in the b factor column of a .pdb file?

TW001 · 7 February 2023 10:33

This was a temporary issue for unknown reason.

TW001 · 7 February 2023 12:18

This is done using the option to colour by annotation