Hiding columns with many gaps

Hi everyone,

Sorry if this is something that has been discussed before, I looked
for it and couldn't find any info.

In large multiple alignments, it can be convenient to hide positions
(columns) where more than a certain fraction of sequences have gaps.
I've seen that option in other editors (e.g. MUST). Is there a way to
do this in Jalview? If not, does anyone think it would be worthwhile
to implement?

I'd offer to do it, but my knowledge of Java is pretty much zilch. I
do know C++ though, so it might be easy to learn enough about Java to
do the job, with pointers from people who know the code well.

Thanks,
Jerome

Hi everyone,

Sorry if this is something that has been discussed before, I looked
for it and couldn't find any info.

In large multiple alignments, it can be convenient to hide positions
(columns) where more than a certain fraction of sequences have gaps.
I've seen that option in other editors (e.g. MUST). Is there a way to
do this in Jalview? If not, does anyone think it would be worthwhile
to implement?

It's definitely worthwhile - and Jalview doesn't do it yet. Unfortunately, I've been distracted by other jalview development priorities. However, there's an entry (albeit rather terse) in our bug-tracker here: http://issues.jalview.org/browse/JAL-508

Essentially, the approach that would be taken is to have a dialog that allows the user to select and/or hide columns based on any annotation's threshold or label content, analogous to the colour by annotation dialog (you could even use the same dialog, and just put hide/select buttons there instead!).

I'd offer to do it, but my knowledge of Java is pretty much zilch. I
do know C++ though, so it might be easy to learn enough about Java to
do the job, with pointers from people who know the code well.

It's not too hard to do. It would probably take someone experienced with the code a couple of days to do, in fact (including documentation and porting the feature to the java applet). In your case, it'd be more like a couple weeks work on & off (or maybe a week full-time), once I'd pointed you to the key classes/methods.

I've also been thinking of adding in GBLOCKS (http://molevol.cmima.csic.es/castresana/Gblocks.html) either as a service or as a built-in routine. This algorithm is one of a variety of methods that have been developed for automatically selecting the most informative/useful regions of an alignment prior to performing further analysis (e.g. tree building).

Let me know if you're interested in taking this further ! :slight_smile:
Jim.

···

On 18/04/2011 15:41, Jérôme Hénin wrote:

This is a feature I have had need of in the past, but instead just ended up removing sequences with very large insertions for the sake of visualization. +1 for hiding columns !

···

On Apr 19, 2011 2:10 AM, “Jim Procter” <foreveremain@gmail.com> wrote:

In large multiple alignments, it can be convenient to hide positions
(columns) where more than a certain fraction of sequences have gaps.
I’ve seen that option in other editors (e.g. MUST). Is there a way to
do this in Jalview? If not, does anyone think it would be worthwhile
to implement?

Hi everyone,

Sorry if this is something that has been discussed before, I looked
for it and couldn't find any info.

In large multiple alignments, it can be convenient to hide positions
(columns) where more than a certain fraction of sequences have gaps.
I've seen that option in other editors (e.g. MUST). Is there a way to
do this in Jalview? If not, does anyone think it would be worthwhile
to implement?

It's definitely worthwhile - and Jalview doesn't do it yet. Unfortunately,
I've been distracted by other jalview development priorities. However,
there's an entry (albeit rather terse) in our bug-tracker here:
http://issues.jalview.org/browse/JAL-508

Essentially, the approach that would be taken is to have a dialog that
allows the user to select and/or hide columns based on any annotation's
threshold or label content, analogous to the colour by annotation dialog
(you could even use the same dialog, and just put hide/select buttons there
instead!).

Indeed, this is much better (more flexible) than what I had in mind.

I'd offer to do it, but my knowledge of Java is pretty much zilch. I
do know C++ though, so it might be easy to learn enough about Java to
do the job, with pointers from people who know the code well.

It's not too hard to do. It would probably take someone experienced with the
code a couple of days to do, in fact (including documentation and porting
the feature to the java applet). In your case, it'd be more like a couple
weeks work on & off (or maybe a week full-time), once I'd pointed you to the
key classes/methods.

Well, I suspect it's going to be a while before I have this kind of
time to dedicate to this.

I've also been thinking of adding in GBLOCKS
(http://molevol.cmima.csic.es/castresana/Gblocks.html) either as a service
or as a built-in routine. This algorithm is one of a variety of methods that
have been developed for automatically selecting the most informative/useful
regions of an alignment prior to performing further analysis (e.g. tree
building).

Again, better than what I was thinking (less arbitrary).

Let me know if you're interested in taking this further ! :slight_smile:

I'll probably need to find a willing student with a bit too much time
on her/his hands. We'll see...

Best,
Jerome

···

On 18 April 2011 18:10, Jim Procter <foreveremain@gmail.com> wrote:

On 18/04/2011 15:41, Jérôme Hénin wrote:

Andrew wrote:

> This is a feature I have had need of in the past, but instead just ended up removing sequences with very large
> insertions for the sake of visualization. +1 for hiding columns !

Thanks Andrew... patches are also gratefully received! :slight_smile:

Well, I suspect it's going to be a while before I have this kind of
time to dedicate to this.

After sleeping on it, I think my estimations are a bit too generous, and it would take a bit less time for an experienced C++ programmer to get their head around the code. However, you're still probably right, it'll take too long - unless you really fancy spending at least a couple of days figuring out how to get Jalview building on your machine, and getting used to how it is structured.

I've also been thinking of adding in GBLOCKS
(http://molevol.cmima.csic.es/castresana/Gblocks.html) either as a service
or as a built-in routine. This algorithm is one of a variety of methods that
have been developed for automatically selecting the most informative/useful
regions of an alignment prior to performing further analysis (e.g. tree
building).

Again, better than what I was thinking (less arbitrary).

This will almost certainly come in with the 2.7.x series of releases - which will include JABAWS services for alignment annotation. However, I have a couple of other annotation type services to implement before I get to this one.

Let me know if you're interested in taking this further ! :slight_smile:

I'll probably need to find a willing student with a bit too much time
on her/his hands. We'll see...

if you do, then let me know. Otherwise, I'll see if I can fit in a very basic implementation for the next but one release (2.7.1) ... the simple case of marking regions that have one or more gaps in the current view is actually very easy to code.. putting it in to the user interface takes a bit more time, and documentation takes the longest, unsurprisingly.

Jim.

···

On 19/04/2011 12:13, Jérôme Hénin wrote:

In that case, there might be a way to split the work. I can't be
remotely as efficient as you with the implementation, but I could
probably help with documenting (and testing as well).

Jerome

···

On 19 April 2011 14:17, Jim Procter <foreveremain@gmail.com> wrote:

if you do, then let me know. Otherwise, I'll see if I can fit in a very
basic implementation for the next but one release (2.7.1) ... the simple
case of marking regions that have one or more gaps in the current view is
actually very easy to code.. putting it in to the user interface takes a bit
more time, and documentation takes the longest, unsurprisingly.