Jalview suggestion

Hello Deepak - this is a great suggestion !

I've been meaning to introduce 'real' support for taxonomic names in
Jalview for some time, but there always seemed to be something else that
took higher priority. However - we will definitely have support for this
in the next major Jalview release, and I'll do what I can to introduce
basic support for this even sooner.

For the moment, you could consider running this groovy script (via the
groovy console under the tools menu) - which will do a search and
replace to put anything in parenthesis at the beginning of the sequence
name:

/// groovy script for inserting taxon name at beginning of sequence ID
import jalview.datamodel.*;
import jalview.gui.AlignFrame;
import jalview.gui.AlignViewport;

// matches the species name in [parentheses]

def descpattern = ~/.*\[([^]]+)/;

def af = Jalview.getAlignframes();

// walk through all alignments, processing all sequences

for (ala in af)
{
     def al = ala.viewport.alignment;
     if (al!=null)
     {
         SequenceI seqs = al.getSequencesArray();
         for (sq in seqs)
         {
             if (sq!=null) {
                if (sq.getDatasetSequence()!=null)
                {
                    // extract species from dataset sequence description
and place at beginning of ID string in alignment
                   def mtch =
sq.getDatasetSequence().getDescription().replaceAll(descpattern,'$1')
                    +"|"+sq.getDatasetSequence().getName();
                   sq.setName(mtch);
                }
             }
          }
      }
      ala.repaint()
}

/// end of script - open Tools->Groovy console, copy and paste into
panel and hit the run button!

It's a bit rough and ready, but should do what you want for the moment.

hope that helps!
Jim.

···

On 18/07/2014 18:17, Deepak Barnabas wrote:

Dear Jim,

Jalview is indispensable to the vast majority of structural biologists including myself. Thanks for your good work with Jalview! We use it on an everyday basis. I have a small suggestion. You probably already have this option somewhere, and I am not looking hard enough. When visualising multiple alignments, it would be very helpful if the organism name is displayed instead of sequence name. Considering that a FASTA format download of multiple sequences following a BLAST from NCBI always results in the organism name being present in the sequence description within parentheses, would it not be easy (as an option) if Jalview extracted the string within these parentheses and displayed them instead of sequence name. Currently, one has to individually hover over each sequence name to display its description to look for the organism. Let me know, if I'm missing out on something.

Thanks!
Deepak

The University of Dundee is a registered Scottish Charity, No: SC015096

Seems my script got a bit truncated - here it is again:

/// groovy script for inserting taxon name at beginning of sequence ID
import jalview.datamodel.*;
import jalview.gui.AlignFrame;
import jalview.gui.AlignViewport;

// matches the species name in [parentheses]

def descpattern = ~/.*\[([^]]+)/;

def af = Jalview.getAlignframes();

// walk through all alignments, processing all sequences

for (ala in af)
{
   def al = ala.viewport.alignment;
   if (al!=null)
   {
      SequenceI seqs = al.getSequencesArray();
      for (sq in seqs)
      {
         if (sq!=null) {
            if (sq.getDatasetSequence()!=null)
            {
                // extract species from dataset sequence description
                // and place at beginning of ID string in alignment
                def mtch = (
sq.getDatasetSequence().getDescription().replaceAll(descpattern,'$1')
                     +"|"+sq.getDatasetSequence().getName());
                    sq.setName(mtch);
                }
            }
      }
  }
  ala.repaint()
}

/// end of script - open Tools->Groovy console and paste this in

···

On 21/07/2014 10:36, Dr JB Procter wrote:

Hello Deepak - this is a great suggestion !

I've been meaning to introduce 'real' support for taxonomic names in
Jalview for some time, but there always seemed to be something else that
took higher priority. However - we will definitely have support for this
in the next major Jalview release, and I'll do what I can to introduce
basic support for this even sooner.

For the moment, you could consider running this groovy script (via the
groovy console under the tools menu) - which will do a search and
replace to put anything in parenthesis at the beginning of the sequence
name:

/// groovy script for inserting taxon name at beginning of sequence ID
import jalview.datamodel.*;
import jalview.gui.AlignFrame;
import jalview.gui.AlignViewport;

// matches the species name in [parentheses]

def descpattern = ~/.*\[([^]]+)/;

def af = Jalview.getAlignframes();

// walk through all alignments, processing all sequences

for (ala in af)
{
      def al = ala.viewport.alignment;
      if (al!=null)
      {
          SequenceI seqs = al.getSequencesArray();
          for (sq in seqs)
          {
              if (sq!=null) {
                 if (sq.getDatasetSequence()!=null)
                 {
                     // extract species from dataset sequence description
and place at beginning of ID string in alignment
                    def mtch =
sq.getDatasetSequence().getDescription().replaceAll(descpattern,'$1')
                     +"|"+sq.getDatasetSequence().getName();
                    sq.setName(mtch);
                 }
              }
           }
       }
       ala.repaint()
}

/// end of script - open Tools->Groovy console, copy and paste into
panel and hit the run button!

It's a bit rough and ready, but should do what you want for the moment.

hope that helps!
Jim.

On 18/07/2014 18:17, Deepak Barnabas wrote:

Dear Jim,

Jalview is indispensable to the vast majority of structural biologists including myself. Thanks for your good work with Jalview! We use it on an everyday basis. I have a small suggestion. You probably already have this option somewhere, and I am not looking hard enough. When visualising multiple alignments, it would be very helpful if the organism name is displayed instead of sequence name. Considering that a FASTA format download of multiple sequences following a BLAST from NCBI always results in the organism name being present in the sequence description within parentheses, would it not be easy (as an option) if Jalview extracted the string within these parentheses and displayed them instead of sequence name. Currently, one has to individually hover over each sequence name to display its description to look for the organism. Let me know, if I'm missing out on something.

Thanks!
Deepak

The University of Dundee is a registered Scottish Charity, No: SC015096
_______________________________________________
Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss