Saving sequence alignments in CLUSTAL format

linda_k · 8 January 2016 15:52

Dear Jalview team,

We have encountered an issue in the way how MSAs are saved using the CLUSTAL format under specific conditions: when the length of the MSA (no matter if DNA or protein sequences) is exactly a multiple of 60 (e.g.: 60 nucleotides, 120 nucleotides, 180…) the resulting CLUSTAL file has a superfluous final section where only the sample ID list is reported.

This, per se, is not a problem since Jalview can re-open this file without issues, but it gets problematic if the file has to be uploaded in the program ClustalX. In this case ClustalX creates a 60-nt or 60-aa sequence repetition at the end of each sample.

This problem can be avoided by first opening the CLUSTAL file with a text editor, removing the last section of sample IDs, saving the file again and finally opening it in ClustalX; but it already occurred to us to forget checking this out and ending up with incorrect alignments.

I realize that this is a “cross-program” problem, but since we have always been very impressed by your quick reaction times I was wondering if you would be able to fix this issue from your side…unless fixing it has negative consequences for other users, of course!

Thank you for your support,

Linda

t_c_n_ofoegbu · 8 January 2016 16:31

Dear Linda,

I can confirm that this is a bug in Jalview’s Clustal parser, and I’ve filed a bug report about it here.

This bug should be very quick to fix and would hopefully be resolved in the development version by next week.

Thanks and regards,
Charles

Ofoegbu Tochukwu Charles
Jalview Visual Analytics Developer/Scientist
The Barton Group
Division of Computational Biology
School of Life Sciences
University of Dundee, Dundee, Scotland, UK.
Skype: cofoegbu
www.jalview.org
www.compbio.dundee.ac.uk

···

On 8 Jan 2016, at 03:52 pm, Linda Kazandjian <lindak84@gmail.com> wrote:

Dear Jalview team,

We have encountered an issue in the way how MSAs are saved using the CLUSTAL format under specific conditions: when the length of the MSA (no matter if DNA or protein sequences) is exactly a multiple of 60 (e.g.: 60 nucleotides, 120 nucleotides, 180…) the resulting CLUSTAL file has a superfluous final section where only the sample ID list is reported.

This, per se, is not a problem since Jalview can re-open this file without issues, but it gets problematic if the file has to be uploaded in the program ClustalX. In this case ClustalX creates a 60-nt or 60-aa sequence repetition at the end of each sample.

This problem can be avoided by first opening the CLUSTAL file with a text editor, removing the last section of sample IDs, saving the file again and finally opening it in ClustalX; but it already occurred to us to forget checking this out and ending up with incorrect alignments.

I realize that this is a “cross-program” problem, but since we have always been very impressed by your quick reaction times I was wondering if you would be able to fix this issue from your side…unless fixing it has negative consequences for other users, of course!

Thank you for your support,

Linda

Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

The University of Dundee is a registered Scottish Charity, No: SC015096

t_c_n_ofoegbu · 11 January 2016 12:07

Hi Linda,

This is to inform you that the bug you reported last week has been fixed. It will go into the next release of Jalview.

However, in the main time you can access the patch from the development version of Jalview here.

Please don’t hesitate to contact us if you experience any other issues.

Thanks and regards,
Charles

Ofoegbu Tochukwu Charles
Jalview Visual Analytics Developer/Scientist
The Barton Group
Division of Computational Biology
School of Life Sciences
University of Dundee, Dundee, Scotland, UK.
Skype: cofoegbu
www.jalview.org
www.compbio.dundee.ac.uk

···

On 8 Jan 2016, at 04:31 pm, Charles Ofoegbu <tcnofoegbu@dundee.ac.uk> wrote:

Dear Linda,

I can confirm that this is a bug in Jalview’s Clustal parser, and I’ve filed a bug report about it here.

This bug should be very quick to fix and would hopefully be resolved in the development version by next week.

Thanks and regards,
Charles

Ofoegbu Tochukwu Charles
Jalview Visual Analytics Developer/Scientist
The Barton Group
Division of Computational Biology
School of Life Sciences
University of Dundee, Dundee, Scotland, UK.
Skype: cofoegbu
www.jalview.org
www.compbio.dundee.ac.uk

On 8 Jan 2016, at 03:52 pm, Linda Kazandjian <lindak84@gmail.com> wrote:

Dear Jalview team,

We have encountered an issue in the way how MSAs are saved using the CLUSTAL format under specific conditions: when the length of the MSA (no matter if DNA or protein sequences) is exactly a multiple of 60 (e.g.: 60 nucleotides, 120 nucleotides, 180…) the resulting CLUSTAL file has a superfluous final section where only the sample ID list is reported.

This, per se, is not a problem since Jalview can re-open this file without issues, but it gets problematic if the file has to be uploaded in the program ClustalX. In this case ClustalX creates a 60-nt or 60-aa sequence repetition at the end of each sample.

This problem can be avoided by first opening the CLUSTAL file with a text editor, removing the last section of sample IDs, saving the file again and finally opening it in ClustalX; but it already occurred to us to forget checking this out and ending up with incorrect alignments.

I realize that this is a “cross-program” problem, but since we have always been very impressed by your quick reaction times I was wondering if you would be able to fix this issue from your side…unless fixing it has negative consequences for other users, of course!

Thank you for your support,

Linda

Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

The University of Dundee is a registered Scottish Charity, No: SC015096