Changing numbering in alignments

Hi everybody,

I am wondering if it is possible to change the sequence numbering of sequences in any multiple sequence alignment. For example, within an MSA with 100-residue sequences, can I change them to designate residues as 4-104, 510-610, etc. Sequences are not from a database (like a swissprot entry), and I don't see this mentioned in the documentation.

Best,
Engin

···

--
Engin Özkan
Post-doctoral Scholar
Howard Hughes Medical Institute
Dept of Molecular and Cellular Physiology
279 Campus Drive, Beckman Center B173
Stanford School of Medicine
Stanford, CA 94305
ph: (650)-498-7111

Dear Engin,

I am wondering if it is possible to change the sequence numbering of
sequences in any multiple sequence alignment. For example, within an MSA
with 100-residue sequences, can I change them to designate residues as
4-104, 510-610, etc. Sequences are not from a database (like a swissprot
entry), and I don't see this mentioned in the documentation.

You're quite right - there isn't a documented option to modify the start/end numbering of the sequence, which is a small but significant oversight.

You can actually change the start/end numbering given for a sequence using the 'Edit Name/Description' dialog box. Simply add the new numbering to the sequence ID using the standard '/start-end' syntax, and it will be updated in the alignment.

However - be warned, there is no error checking for the start/end actually corresponding to the length of your sequences (which is probably why this was not documented!). I just played with this myself to check its behaviour when there is annotation on the alignment, and found that there are some surprising 'emergent features' that will have to be dealt with in a future release.

For the moment - should you have to edit the start/end position for sequences, make absolutely sure that the start position is correct. Then, once you've modified all the sequences you need to, select all the sequences in the alignment and copy and paste them to a new alignment window - Jalview will then update all the end positions according to the number of non-gap symbols actually in each sequence.

Best of luck! :wink:
Jim.

ps. I've created a new feature request regarding start/end editing here: http://issues.jalview.org/browse/JAL-680

···

On 15/10/2010 06:28, Engin Ozkan wrote:

--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

Ommom

···

Sent from my iPhoneq

On Oct 15, 2010, at 6:23 AM, "Jim Procter" <jprocter@compbio.dundee.ac.uk> wrote:

Dear Engin,

On 15/10/2010 06:28, Engin Ozkan wrote:

I am wondering if it is possible to change the sequence numbering of
sequences in any multiple sequence alignment. For example, within an MSA
with 100-residue sequences, can I change them to designate residues as
4-104, 510-610, etc. Sequences are not from a database (like a swissprot
entry), and I don't see this mentioned in the documentation.

You're quite right - there isn't a documented option to modify the
start/end numbering of the sequence, which is a small but significant
oversight.

You can actually change the start/end numbering given for a sequence
using the 'Edit Name/Description' dialog box. Simply add the new
numbering to the sequence ID using the standard '/start-end' syntax, and
it will be updated in the alignment.

However - be warned, there is no error checking for the start/end
actually corresponding to the length of your sequences (which is
probably why this was not documented!). I just played with this myself
to check its behaviour when there is annotation on the alignment, and
found that there are some surprising 'emergent features' that will have
to be dealt with in a future release.

For the moment - should you have to edit the start/end position for
sequences, make absolutely sure that the start position is correct.
Then, once you've modified all the sequences you need to, select all the
sequences in the alignment and copy and paste them to a new alignment
window - Jalview will then update all the end positions according to the
number of non-gap symbols actually in each sequence.

Best of luck! :wink:
Jim.

ps. I've created a new feature request regarding start/end editing here:
http://issues.jalview.org/browse/JAL-680

--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

_______________________________________________
Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

Thanks, that does the job. This would be a valuable feature to add.

By the way, copy-and-pasting wasn't necessary: the end position was updated immediately when I changed the Name/Description of the sequence.

Engin

···

On 10/15/10 3:23 AM, Jim Procter wrote:

Dear Engin,

On 15/10/2010 06:28, Engin Ozkan wrote:

I am wondering if it is possible to change the sequence numbering of
sequences in any multiple sequence alignment. For example, within an MSA
with 100-residue sequences, can I change them to designate residues as
4-104, 510-610, etc. Sequences are not from a database (like a swissprot
entry), and I don't see this mentioned in the documentation.

You're quite right - there isn't a documented option to modify the start/end numbering of the sequence, which is a small but significant oversight.

You can actually change the start/end numbering given for a sequence using the 'Edit Name/Description' dialog box. Simply add the new numbering to the sequence ID using the standard '/start-end' syntax, and it will be updated in the alignment.

However - be warned, there is no error checking for the start/end actually corresponding to the length of your sequences (which is probably why this was not documented!). I just played with this myself to check its behaviour when there is annotation on the alignment, and found that there are some surprising 'emergent features' that will have to be dealt with in a future release.

For the moment - should you have to edit the start/end position for sequences, make absolutely sure that the start position is correct. Then, once you've modified all the sequences you need to, select all the sequences in the alignment and copy and paste them to a new alignment window - Jalview will then update all the end positions according to the number of non-gap symbols actually in each sequence.

Best of luck! :wink:
Jim.

ps. I've created a new feature request regarding start/end editing here: http://issues.jalview.org/browse/JAL-680

--
Engin Özkan
Post-doctoral Scholar
Howard Hughes Medical Institute
Dept of Molecular and Cellular Physiology
279 Campus Drive, Beckman Center B173
Stanford School of Medicine
Stanford, CA 94305
ph: (650)-498-7111

If you are changing options around the numbering, it would be nice to have the choice not to have numbers at all. In our application we have custom labels for thousands of sequences, and I end up having to strip the forward slash and numbers out of the labels with other programs later.

Thanks,

Jared

···

-----Original Message-----
From: jalview-discuss-bounces@jalview.org [mailto:jalview-discuss-bounces@jalview.org] On Behalf Of Engin Ozkan
Sent: Friday, October 15, 2010 4:48 PM
To: jalview-discuss@jalview.org
Subject: Re: [Jalview-discuss] Changing numbering in alignments

Thanks, that does the job. This would be a valuable feature to add.

By the way, copy-and-pasting wasn't necessary: the end position was
updated immediately when I changed the Name/Description of the sequence.

Engin

On 10/15/10 3:23 AM, Jim Procter wrote:

Dear Engin,

On 15/10/2010 06:28, Engin Ozkan wrote:

I am wondering if it is possible to change the sequence numbering of
sequences in any multiple sequence alignment. For example, within an MSA
with 100-residue sequences, can I change them to designate residues as
4-104, 510-610, etc. Sequences are not from a database (like a swissprot
entry), and I don't see this mentioned in the documentation.

You're quite right - there isn't a documented option to modify the
start/end numbering of the sequence, which is a small but significant
oversight.

You can actually change the start/end numbering given for a sequence
using the 'Edit Name/Description' dialog box. Simply add the new
numbering to the sequence ID using the standard '/start-end' syntax,
and it will be updated in the alignment.

However - be warned, there is no error checking for the start/end
actually corresponding to the length of your sequences (which is
probably why this was not documented!). I just played with this myself
to check its behaviour when there is annotation on the alignment, and
found that there are some surprising 'emergent features' that will
have to be dealt with in a future release.

For the moment - should you have to edit the start/end position for
sequences, make absolutely sure that the start position is correct.
Then, once you've modified all the sequences you need to, select all
the sequences in the alignment and copy and paste them to a new
alignment window - Jalview will then update all the end positions
according to the number of non-gap symbols actually in each sequence.

Best of luck! :wink:
Jim.

ps. I've created a new feature request regarding start/end editing
here: http://issues.jalview.org/browse/JAL-680

--
Engin Özkan
Post-doctoral Scholar
Howard Hughes Medical Institute
Dept of Molecular and Cellular Physiology
279 Campus Drive, Beckman Center B173
Stanford School of Medicine
Stanford, CA 94305
ph: (650)-498-7111

_______________________________________________
Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

______________________________________________________________________
CAUTION: This message was sent via the Public Internet and its authenticity cannot be guaranteed.

This message sent from Smiths Detection, a division of Smiths Group.

PROPRIETARY: This e-mail contains proprietary information some or all of which may be legally privileged. It is intended for the recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the authority by replying to this e-mail. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this e-mail.

Hi Jared.

···

On 18/10/2010 13:59, Ackers, Jared (DEWN) wrote:

If you are changing options around the numbering, it would be nice to have the choice not to have numbers at all. In our application we have custom labels for thousands of sequences, and I end up having to strip the forward slash and numbers out of the labels with other programs later.

You can control whether you want to have numbers included in the display using the options in the 'Format' menu, and you can also control whether numbers are appended to the sequence IDs when outputting alignments to files via the 'Output' tab of the user preferences dialog box.

Jim.

Ah, thanks. I'm relatively new and hadn't found that one yet.

···

-----Original Message-----
From: Jim Procter [mailto:foreveremain@gmail.com] On Behalf Of Jim Procter
Sent: Monday, October 18, 2010 9:22 AM
To: Ackers, Jared (DEWN)
Cc: Engin Ozkan; jalview-discuss@jalview.org
Subject: Re: [Jalview-discuss] Changing numbering in alignments

Hi Jared.

On 18/10/2010 13:59, Ackers, Jared (DEWN) wrote:

If you are changing options around the numbering, it would be nice to have the choice not to have numbers at all. In our application we have custom labels for thousands of sequences, and I end up having to strip the forward slash and numbers out of the labels with other programs later.

You can control whether you want to have numbers included in the display
using the options in the 'Format' menu, and you can also control whether
numbers are appended to the sequence IDs when outputting alignments to
files via the 'Output' tab of the user preferences dialog box.

Jim.

______________________________________________________________________
CAUTION: This message was sent via the Public Internet and its authenticity cannot be guaranteed.

This message sent from Smiths Detection, a division of Smiths Group.

PROPRIETARY: This e-mail contains proprietary information some or all of which may be legally privileged. It is intended for the recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the authority by replying to this e-mail. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this e-mail.