How to get a count of conserved positions of an alignment?

Hello! My name is Francisco, I am a Biology student who is using Jalview for his final degree project.

Could I use Jalview in a simple way to deduce, from an amino acid sequence alignment, the fraction of conserved positions? For example, be able to say that the alignment shows that 45/300 positions (15%) are conserved in all the sequences…

If not with Jalview… Any alternative with which to make this possible?

Thank you in advance for your attention and apologies for the inconvenience.

Hi Francisco!

The Consensus annotation (AnnotationsShow alignment related) is almost what you’re after, but I don’t think there’s a quick GUI way to get the precise percentage information you’re after.
However, there is a relatively quick way to do it using the Groovy Console. I found a similar groovy script in this post by @jalviewbugreporter:
(view only residues with a consensus score lower than threshold - #2 by jalviewbugreporter)
and adapted it to print the information you’re after (which I think is counting the number of residues with 100% consensus in the alignment, and calculating as a percentage of all residues). I’ve left in the column hiding/showing part of the script because it’s visually neat!

To use the script:

  • Open Jalview and open the alignment(s) that you’re interested in.
  • Open the Groovy Console (ToolsGroovy Console…)

A new window opens with a white top half for the script and a yellow bottom half for the output.

  • Copy the following script and paste it into the script editor (the top half):
def consThresh = 100f;

def alf = Jalview.getAlignFrames();
for (ala in alf)
{
   // ala is an jalview.gui.AlignFrame object
   // get the alignment consensus annotation
   def alcons = ala.viewport.getAlignmentConsensusAnnotation();
   // and mark columns in a column selection object
   jalview.datamodel.ColumnSelection cs = ala.viewport.getColumnSelection();
   if (cs == null) {
     cs = new jalview.datamodel.ColumnSelection();
     ala.viewport.setColumnSelection(cs);
   } else {
     cs.clear();
   }
   int p=0;
   int count = 0;
   for (q in alcons.annotations)
   {
     if (q!=null && q.value>=consThresh)
     {
         cs.addElement(p);
         count++;
     }
     p++
   }
   // lastly simulate a 'SHIFT+CTRL+H' to hide unmarked regions
   ala.hideAllButSelection_actionPerformed(null);
   // print the percentage of consensus positions
   printf("%s: There are %d/%d = %3.2f%% residues with at least %3.1f%% consensus\n", ala.getTitle(), count, p, (count/p*100), consThresh);  
}
  • Then click on the execute script button (image second from the right at the top of the Groovy Console window).

What happens next is that the script will hide all columns that do not have 100% consensus (i.e. all the residues are the same, with no gaps in the column), and if you scroll down to the bottom of the Groovy Console output (the yellow half at the bottom), then you should see a line of output for each alignment saying something like:

/home/user/jalview/examples/uniref50.fa: There are 34/157 = 21.66% residues with at least 100.0% consensus
/home/user/test_fab41.result/sample.a2m: There are 1/379 = 0.26% residues with at least 100.0% consensus

You can click on the execute button again to unhide the columns.

If you want to adjust the threshold (lower than 100%) then edit the value of consThresh in the top line, and of course you can edit that printf line near the bottom to get the output in a format you want (CSV?), especially if there are a lot of alignments.

I hope this gives you what you’re after – let us know how you get on!

Ben

Hi Francisco and hi Ben,

the script you recommended is very good, it worked perfectly for me when I just tested it.

However, there’s on caveat: the script gives you the percentage calculated from the total alignment length but not the percentage of the length of your Sequence of Interest - you’ll have to clearly state that in your results.

cheers and good luck with your project.

Arnulf

1 Like