Labeling group after alignment

hi everyone!

I have done a nucleotide alignment using 1200 sequences (from different genes) ranging from 79 nt to 600 nt long. Thus, I have a considerable degree of variation in length.

I would like to label two groups to investigate if there is higher similarity in one group than the other. How can I label the sequences and calculate the percent of similarity (%) in each group?

Thank you for any help

Gabriel
Phd student

Hi Gabriel,

Hopefully I’ve understood the problem and I’ll have a go at some suggestions to do this, breaking down the steps involved.

The first stage is creating the two groups of sequences.
You can create a group of sequences by selecting multiple sequences (best done by clicking on the sequence label on the left). You can do this with the usual selection shortcuts:

  • click and drag (to select several contiguous sequences),
  • hold Shift key while clicking (to select all sequences between the previous click and this click),
  • hold Control key while clicking (to add a non-contiguous sequence to the current selection).

I don’t know if your sequences are already arranged into an order that makes it easy to select your two groups, but you could try the Calculate → Sort options to re-arrange them first.
You can also try and order the sequences using a Neighbour Joining tree: Choose Calculate → Calculate Tree or PCA…, then Calculate the tree, in the resulting tree window, you can click on a position along the tree to create groups of sequences as they appear at that point along the tree, although this may be too many groups. Either way you can then do View → Sort Alignment By Tree to sort the sequences in the alignment according to the vertical ordering in the tree. This should make it easier to group your sequences manually. If you want to remove groups the tree has made, choose Select → Undefine Groups in the alignment window. The ordering will remain though which could be helpful for selecting your groups.

Once you have selected the sequences for the first group, you can choose Select → Create Group, or if you want a specific label for the group, you can right click on the alignment and choose Selection → Edit New Group → Edit name and description of current group

Now you want to select the other sequences for the second group. Unfortunately if at this stage you Select → Invert Sequence Selection it also inverts the first group, so you will have to select the other sequences manually.
Once selected, you can create the second group in the same way as before.

You now have two groups defined, and if you look at the Annotations → Autocalculated Annotation menu, ensure that all three of Apply to all groups, Show Consensus Histogram and Group Consensus are checked, and also make sure Annotations → Show annotations is checked too!

You should see three Consensus histograms and sequences, one for each group and one for the whole alignment. Mousing over a consensus shows the percentage identity for that position (for that group).

If you want to rename either a sequence or an annotation you can right click on the label and look for the Edit Name/Description and Edit Label/Description entries.

Hopefully this is close to what you were after but feel free to ask again if it wasn’t!

Ben

Thank you so much for your answer! Sorry for the late reply.

1 Like