Quick Poll: how do you use the sequence ID/start-end numbering ?

Hi.

Joel's recent email prompted me to go an a brief but intense bug hunt,
and one issue that emerged is that Jalview will do 'strange things' if
the sequence's end position is not correct. What I mean by this is that
if you provide an alignment with incorrect numbering, like :

my sequence/5-9

ASDQFNSQNWSTQ

Jalview will use the start position to number the first residue,
alanine, as 5, and so on, but for certain functions - including the
'find' and jump to residue command in keyboard mode, Jalview will only
highlight/move to the 9th residue - phenylalanine.

The problem only really occurs when the user provided end position is
shorter than the actual number of sequence characters, and I can't
actually think of a case when one might actually want to do this. Can
anyone else ?

Thanks in advance!
Jim.

ps. Please disregard any cases related to reversed sequences - Jalview
does not currently cope with 3 to 5prime nucleotide or C-N terminal
amino sequences in alignments. In fact, the reason we allow the 'end'
symbol to vary freely was to allow such sequences to be visualised (even
if Jalview then gives completely the wrong sequence positions in
mouseovers, etc).

···

--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

Hi Jim,

I think in this case the potential for confusion outweighs any utility in being able to restrict the search region for ‘Find’ by using the user defined start and end positions.

My vote would be to make ‘Find’ work on the full length sequence and ignore the /1-X numbering.

I’ve noticed that if you select a region within an alignment (ie by dragging the mouse) then ‘Find’ is restricted to this region - that functionality should be enough in the case where someone wanted to search within a defined region.

Cheers,

Andrew Perry

Postdoctoral Fellow
Whisstock Lab
Department of Biochemistry and Molecular Biology
Monash University, Clayton Campus, PO Box 13d, VIC, 3800, Australia.
Mobile: +61 409 808 529

···

On Tue, Feb 15, 2011 at 9:34 PM, Jim Procter <jprocter@compbio.dundee.ac.uk> wrote:

Hi.

Joel’s recent email prompted me to go an a brief but intense bug hunt,
and one issue that emerged is that Jalview will do ‘strange things’ if
the sequence’s end position is not correct. What I mean by this is that
if you provide an alignment with incorrect numbering, like :

my sequence/5-9
ASDQFNSQNWSTQ

Jalview will use the start position to number the first residue,
alanine, as 5, and so on, but for certain functions - including the
‘find’ and jump to residue command in keyboard mode, Jalview will only
highlight/move to the 9th residue - phenylalanine.

The problem only really occurs when the user provided end position is
shorter than the actual number of sequence characters, and I can’t
actually think of a case when one might actually want to do this. Can
anyone else ?

Thanks in advance!
Jim.

Hi Andrew, thanks for responding!

I think in this case the potential for confusion outweighs any utility
in being able to restrict the search region for 'Find' by using the
user defined start and end positions.

true, although in fairness - the latter is actually a bug caused by
Jalview using the end attribute purely for efficiency (to avoid
searching the tail end of an alignment row for non-gaps after the last
sequence position had already been reached). What I was actually getting
at was whether anyone would object to jalview automatically correcting
the 'end' attribute if it discovers it to be shorter than the number of
non-gap symbols in the sequence.

My vote would be to make 'Find' work on the full length sequence and
ignore the /1-X numbering.

That's my preferred solution, in this case and and for any of the other
cases where the end attribute was being used. It's really only useful
for visualization, and for (eventually) indicating if a sequence is
forward or reverse sense.

I've noticed that if you select a region within an alignment (ie by
dragging the mouse) then 'Find' is restricted to this region - that
functionality should be enough in the case where someone wanted to
search within a defined region.

glad you noticed :wink: !

Jim.

···

On 17/02/2011 00:12, Andrew Perry wrote:

--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

Jim-

Sorry for the delayed response.

I agree with Andrew that I’d prefer it if Jalview were to accept a user-defined start position and then calculate what number the last reside is for the sequence. I was a bit confused by some incorrect end numbers a while back, and it would be nice to have this bug fixed.

-Joel

···

On Thu, Feb 17, 2011 at 2:49 AM, Jim Procter <jprocter@compbio.dundee.ac.uk> wrote:

Hi Andrew, thanks for responding!

On 17/02/2011 00:12, Andrew Perry wrote:

I think in this case the potential for confusion outweighs any utility
in being able to restrict the search region for ‘Find’ by using the
user defined start and end positions.

true, although in fairness - the latter is actually a bug caused by
Jalview using the end attribute purely for efficiency (to avoid
searching the tail end of an alignment row for non-gaps after the last
sequence position had already been reached). What I was actually getting
at was whether anyone would object to jalview automatically correcting
the ‘end’ attribute if it discovers it to be shorter than the number of
non-gap symbols in the sequence.

My vote would be to make ‘Find’ work on the full length sequence and
ignore the /1-X numbering.

That’s my preferred solution, in this case and and for any of the other
cases where the end attribute was being used. It’s really only useful
for visualization, and for (eventually) indicating if a sequence is
forward or reverse sense.

I’ve noticed that if you select a region within an alignment (ie by
dragging the mouse) then ‘Find’ is restricted to this region - that
functionality should be enough in the case where someone wanted to
search within a defined region.

glad you noticed :wink: !

Jim.

J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.


Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

Ah, I see. I’d have no objections there.

Andrew Perry

Postdoctoral Fellow
Whisstock Lab
Department of Biochemistry and Molecular Biology
Monash University, Clayton Campus, PO Box 13d, VIC, 3800, Australia.
Mobile: +61 409 808 529

···

On Thu, Feb 17, 2011 at 9:49 PM, Jim Procter <jprocter@compbio.dundee.ac.uk> wrote:

Hi Andrew, thanks for responding!

On 17/02/2011 00:12, Andrew Perry wrote:

I think in this case the potential for confusion outweighs any utility
in being able to restrict the search region for ‘Find’ by using the
user defined start and end positions.

true, although in fairness - the latter is actually a bug caused by
Jalview using the end attribute purely for efficiency (to avoid
searching the tail end of an alignment row for non-gaps after the last
sequence position had already been reached). What I was actually getting
at was whether anyone would object to jalview automatically correcting
the ‘end’ attribute if it discovers it to be shorter than the number of
non-gap symbols in the sequence.