sorting by sequence selection

Hello,

Is there a way to sort all the sequences or a set of them in a Jalview session by some selected columns? For instance, I have a gapped column in 1000 sequences and it’s a pain to try to find the inserted residue. If I can just sort by that column it would move to the top or the bottom of the sequences. In other cases, I am looking for different mutations at some positions in antibodies, and if I can select 3 columns from a set of sequences and then reorder those whole sequences in alphabetical order of the selected 3-residue segment in each.

Thanks,
Roland

···

Roland Dunbrack (he/his/him/him)
Institute for Cancer Research
Fox Chase Cancer Center
Philadelphia PA 19111
http://dunbrack.fccc.edu
http://dunbrack.org

Hi Roland,

One possibility is to select the columns and built a tree based on these columns (Calculate=>Calculate Tree or PCA).

Then in the tree windows, you can sort alignment by tree order (View=>Sort Alignment by Tree).

Best regards,

Romain

···

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, July 4, 2020 11:34 AM, Roland Dunbrack roland.dunbrack@gmail.com wrote:

Hello,

Is there a way to sort all the sequences or a set of them in a Jalview session by some selected columns? For instance, I have a gapped column in 1000 sequences and it’s a pain to try to find the inserted residue. If I can just sort by that column it would move to the top or the bottom of the sequences. In other cases, I am looking for different mutations at some positions in antibodies, and if I can select 3 columns from a set of sequences and then reorder those whole sequences in alphabetical order of the selected 3-residue segment in each.

Thanks,

Roland


Roland Dunbrack (he/his/him/him)

Institute for Cancer Research

Fox Chase Cancer Center

Philadelphia PA 19111

http://dunbrack.fccc.edu

http://dunbrack.org

Hi Roland,

Another possibility you might try (which for your first use case will do exactly what you want, and for your second use case get you most of the way):

  1. Select the column(s) you are interested in:
    to select a single column just click at the top of the column to select a single column;
    to select multiple contiguous columns click and drag at the top of the columns;
    to select multiple non-contiguous columns, select the first column(s), then hold the Ctrl (or cmd if on a Mac) key down whilst selecting further column(s).

  2. Click on “Select” → “Make Groups for Selection” which will create a group for each different combination of residues/gaps in your selected columns. The groups themselves will currently be scattered throughout the sequences, so…

  3. Click on “Calculate” → “Sort” → “By Group” which will then re-arrange the sequences so that sequences in the same group are together. These groups will be ordered from largest group to smallest group(s), or the other way round. Select Sort By Group again to reverse the order they’re in.

In your first use case, you should have the sequences ordered with all the sequences with a gap in the selected column, followed by the single sequence with a residue in the column (or the other way round if you reverse the order).

In the second use case it won’t be in alphabetical order, but in order of size of groups (i.e. occurrences of each residue/gap combination). You might find it easier to see what’s going on in those groups/columns by selecting “View” → “Hide” → “All but Selected Region” (assuming you still have the relevant columns selected). When you’ve found your sequences of interest, select them (click on the ID) and you can bring back the hidden columns with “View” → “Show” → “All Columns”.

Hope I’ve understood correctly and this helps,

Ben

···

From: jalview-discuss-bounces@jalview.org jalview-discuss-bounces@jalview.org on behalf of Roland Dunbrack roland.dunbrack@gmail.com
Sent: 04 July 2020 11:34
To: jalview-discuss@jalview.org jalview-discuss@jalview.org
Subject: [Jalview-discuss] sorting by sequence selection

Hello,

Is there a way to sort all the sequences or a set of them in a Jalview session by some selected columns? For instance, I have a gapped column in 1000 sequences and it’s a pain to try to find the inserted residue. If I can just sort by that column it would move to the top or the bottom of the sequences. In other cases, I am looking for different mutations at some positions in antibodies, and if I can select 3 columns from a set of sequences and then reorder those whole sequences in alphabetical order of the selected 3-residue segment in each.

Thanks,
Roland


Roland Dunbrack (he/his/him/him)
Institute for Cancer Research
Fox Chase Cancer Center
Philadelphia PA 19111
http://dunbrack.fccc.edu
http://dunbrack.org

The University of Dundee is a registered Scottish Charity, No: SC015096

Hi Ben,

Thanks for the suggestions. The sorting by group worked pretty well.

There are some situations where alphabetical sorting would be helpful (to get all residues of a certain type together at the first position and then by the second position. I guess I can group on a single column but then that doesn’t sort the second column. But I can work with the sorting by group.

By the way, Jim Procter gave me some “Groovy” scripts to left-justify some selected columns or right-justify some selected columns. I am finding them really, really helpful in cleaning up alignments that are very gappy in some loop regions and I don’t care about the pairwise alignments in that region; I can left-justify them and then delete empty columns. He said it was an idea to include that as a regular feature. It’s a little time consuming to read in the left-script, and then the right-script, and then the left-script again since sometimes it takes a lot of both scripts to clean up an entire alignment in a way that makes sense. For my two cents, I think that would be great if it were just a preloaded tool that would make it easy to left and right justify and set of selected columns.

Thanks,
Roland

···

Roland Dunbrack (he/his/him/him)
Institute for Cancer Research
Fox Chase Cancer Center
Philadelphia PA 19111
http://dunbrack.fccc.edu
http://dunbrack.org