Batch calculation of consensus sequence

Hi,

I’m trying to calculate the consensus sequence of 172 multiple sequence alignments. It would drive me insane to do look this up manually. Is there any way to calculate the consensus using the CLI or to process my files as a batch?

Thanks!

Hi Claire,

This is a great use case for Jalview’s new command line interface (batch processing files) so I’m very glad you asked!

Whilst the new command line will handle batch processing of the file contents, and will write new files (or can overwrite – use with caution!), we still need a little bit of groovy to get hold of the consensus.

The quickest way to do this is to make a small groovy script file (called, e.g. add_consensus.groovy) that contains:

def view = currentAlFrame.getViewport();
def seq = view.getConsensusSeq();
view.getAlignment().addSequence(seq);

If all your FASTA files are in a folder fastafolder then you can run

jalview --open="fastafolder/*.fa" --groovy=add_consensus.groovy --output="fastafolder/*_consensus.fa" --close

which will create a file fastafolder/original_filename_consensus.fa for each FASTA file, that contains the original alignment with an extra sequence line containing the Consensus sequence.

As an aside explaining the command line parts:

  • --open is straightforward, giving a list of files for jalview to open in turn.
  • --groovy tells Jalview to run this groovy script for each opened alignment.
  • --output gives a filename “template” to use for the output file.
    A “*” used after the last “/” gets replaced with the basename of the input file (i.e. the filename without the path or extension). You can also use the string “{basename}” to do the same.
  • --close tells Jalview to close each alignment after processing and outputting, before moving on to the next, so there isn’t a build up of memory usage.

Note that where files are being output using the command line, Jalview will automatically run in headless mode. If for some reason you want to watch Jalview doing this (why? it’s fun!) you can add --gui. Headless mode is obviously better for running on a remote server though.

This might be all you need to do, but I’ll post some follow-up improvements/variations to this in case your requirements are different. In particular, this method of taking the Consensus sequence removes the + symbols that show ambiguous positions where there are multiple Amino Acids with the modal number of occurrences, and replaces it with the alphabetically first one. I’ll show how to preserve the + in a follow-up post.

Ben

In order to preserve the “+” symbols in the Consensus sequence, just a few more lines are needed in the groovy script (called, e.g. add_ambiguity_consensus.groovy):

def view = currentAlFrame.getViewport();
def seq = view.getConsensusSeq();
def ann = view.getAlignmentConsensusAnnotation();
def chars = ann.annotations.collect { it.displayCharacter; }
seq.setSequence(new String((char[])chars.toArray()));
view.getAlignment().addSequence(seq);

and you can run Jalview from the command line as before.

If you have a lot of FASTA files in folders and subfolders, you can get Jalview to find them all by using a Java-style “**” wildcard that matches against folders as well as filenames like this:

jalview --open="fastafolder/**.fa" --groovy=add_ambiguity_consensus.groovy --output="*/*_consensus.fa" --close
  • In --open, “fastafolder/**.fa” will match .fa files in fastafolder or subfolders of fastafolder.
  • In --output, we need to use the folder path for the input files too which is what the “*” at the start of the filename template and before the last “/” does. You can also use “{dirname}” to do the same.

Jalview should not overwrite files unless you add an --overwrite argument.

If all you want is the Consensus sequence in a separate file (without the original alignment), you can delete the other sequences in the alignment and be sure to save to a different file otherwise you could lose the original alignments.

Using another groovy file (called, e.g. only_ambiguity_consensus.groovy), containing:

def view = currentAlFrame.getViewport();
def seq = view.getConsensusSeq();
def ann = view.getAlignmentConsensusAnnotation();
def chars = ann.annotations.collect { it.displayCharacter; }
seq.setSequence(new String((char[])chars.toArray()));
def al = view.getAlignment();
def num = al.getSequences().size();
num.times { al.deleteSequence(0); }
al.addSequence(seq);

and run on the command line as

jalview --open="fastafolder/**.fa" --groovy=only_ambiguity_consensus.groovy --output="*/*_consensus.fa" --close

If you want to save the consensus only files in a completely separate folder you can do

jalview --open="fastafolder/**.fa" --groovy=only_ambiguity_consensus.groovy --output="consensusfiles/*/*.fa" --close --mkdirs

--mkdirs tells Jalview to create the subfolders that probably don’t exist yet.

Hopefully one of these variations covers what it is you want to do with the Consensus sequence, but if not let us know some more details!

You can read more about Jalview’s new command line at https://www.jalview.org/help/html/features/clarguments.html .

Ben

Thank you so much for the detailed response! This was super helpful!

1 Like