RNA support to Jalview

Hello Anne - great to hear from you !

I started my internship about Jalview and Varna yesterday and I would
like to ask you some questions.

No problem. I'm always happy to help.

I currently trying to integrate tertiary interactions in the current
model Jalview but I can not find the file. Class containing the model
of secondary interactions. I think it is in the package datamodel but
none matched what I was looking at the secondary structure. Maybe I
was doing a bad representation: an array containing the sequence and
each nucleotide, the nucleotide with which it interacts.

The secondary structure is stored as a list of pairwise contacts held in the the jalview.datamodel.AlignmentAnnotation class. It's done this way because whilst secondary structure might be associated with a sequence, it might also be associated with a whole alignment, or a specific group of sequences. The field in question is jalview.datamodel.AlignmentAnnotation._rnasecstr

Make sure you are working off the 'develop' branch of Jalview - the currently released version (v2.7) does not have any of the RNA secondary structure capabilities available. If you register at http://issues.jalview.org/ I'll add your user to the development group so you can checkout and commit to the git repository at https://source.jalview.org/git/jalview.git

It is true that I still do not understand the whole architecture of
Jalview. I will try in the coming days to have a better understanding.

We should probably have a short skype meeting so I can walk you through the code, and answer any questions you might have. Shall we talk tomorrow morning ? (I'm free at 10am my time).

Also, I read the article on Nested Containment List:
http://bioinformatics.oxfordjournals.org/content/23/11/1386.full
According to the authors, it is coded in C but has a pluggin for
Python but how is it to use these lists in java?

The data structure itself is not very complex, and I think you'll easily implement it in Java.
The way I was thinking this might work is to create a neste containment list that allows efficient search and subselection operations like:
jalview.datamodel.NCList<IntervalContainer> implements List
{
/**
* find all intervals on (if start_inclusive is set), and before or after start
*/
  public List<IntervalContainer> find(long start, boolean start_inclusive, boolean before_or_after);

/**
* find all intervals on or between start and end (according to start_inclusive and end_inclusive)
*/
  public List<IntervalContainer> find(long start, boolean start_inclusive, long end, boolean end_inclusive);
}

The IntervalContainer interface defines the basic methods for finding the location of the interval (e.g. getStart() and getEnd()).

Don't worry too much about ultrafast implementation for the moment - anything will be better than what Jalview has at the moment, which is simply a list of pairs that must be searched through each time.

Jim.

···

On Tue Jun 12 12:50:01 2012, Ménard Anne wrote:

Dear Anne.

I create an account on http://issues.jalview.org with the username : menard.

I've added you to the list of users that are allowed to push to the jalview git repository. You should be able to check out the code with a command like:

git clone https://menard@source.jalview.org/git/jalview.git

You should then create your own branch to work in off the development branch

git checkout develop
git branch -b <your own branch name>

you'll be able to make this branch public by doing this:
git push --all

Check that you can see your branch by looking at the public repository :
http://source.jalview.org/gitweb/?p=jalview.git;a=heads

Not having succeeded in having access to the git sources, I used
google summer of code 2011 sources code but there was only the
sources and not the utils , lib, examples etc ..

please take a look at the latest code in the develop branch. I've done some work to integrate the GSOC 2011 code into the core of Jalview.

jalview.datamodel.NCList<IntervalContainer> implements List
{
/**
  * find all intervals on (if start_inclusive is set), and before or
after start
  */
   public List<IntervalContainer> find(long start, boolean
start_inclusive, boolean before_or_after);

/**
  * find all intervals on or between start and end (according to
start_inclusive and end_inclusive)
  */
   public List<IntervalContainer> find(long start, boolean
start_inclusive, long end, boolean end_inclusive);
}

In fact I thought it would import a library (not available under java
so use JPython). Also, I thought it would take longer to encode.

I don't quite understand what you said here. Are you suggesting that you import a library ? If you can find a native Java implementation that is licensed in a way compatible with the GPL, then that would be fine, but I don't think a JPython libary would be efficient for this.

simply a list of pairs that must be searched through each time.

I did not understand what that means but it might be a problem with English.

What i mean is that currently, Jalview stores base pairs like:
Pairs: ((..))..((.(..)))
Jalview datamodel: (1,6),(2,5),(9,17),(10,16),(12,15)
So to find all pairs involving base 10, Jalview would need to check the whole list. A nested containment list would mean the search would only be made on base pairs involved in the region of base 10.

For tonight, I'll look at the file:
jalview.datamodel.AlignmentAnnotation and write the algorithm and data
structure of containment nescent list as Yann Ponty advise me .

OK.

Can I disturb you again tomorrow if I have more questions?

Sure.

I think we should await the return of Yann to talk on skype.

OK. I'll wait until tomorrow to talk.

Jim.

···

On Tue Jun 12 16:12:04 2012, Ménard Anne wrote:

Dear all,

Thank you for you response .

I can't connect to the server with the following command:
git clone https://menard@source.jalview.org/git/jalview.git

When I enter the password.I get the following error:
fatal: https://menard@source.jalview.org/git/jalview.git/info/refs
download error - server certificate verification failed. CAfile:
/etc/ssl/certs/ca-certificates.crt CRLfile: none

However the password is correct, then it is the same as that used on
issues.jalview.

Pairs: ((..))..((.(..)))
Jalview datamodel: (1,6),(2,5),(9,17),(10,16),(12,15)
So to find all pairs involving base 10, Jalview would need to check the whole list. A nested >containment list would mean the search would only be made on base pairs involved in the >region of base 10.

I summarize to check if I understand.
I) I enter an id in Jalview eg RF00360
Jalview will pick on rfam: http://rfam.sanger.ac.uk/family/my_id
for this example : http://rfam.sanger.ac.uk/family/RF00360

II) Then I go to Rfam. I have access to all necessary information:
sequences, secondary structure prediction, alignments .. but I can
only download the alignment in Stockholm format.
So, I think that the file is "downloaded" in cache memory and it
formats it to get the alignment pretty visible under jalview.

III) I don't see how one obtains the secondary structure. From a Rfam
file or recalculates from the sequence?
What are the references for the sequence alignment ?
For the sequence consensus, It's ok .

Then I suppose that VARNA help to transform "((.))" into a visible
sequence interactions .
It is in this last part that I work, I have to improve "((" so that it
can have different noncanonical interactions : Hoogsteen and Sugar.

Currently , is that correct ? and where can I find the informations
mentioned above .

Sincerely,
Anne .

···

2012/6/13, Jim Procter <jprocter@compbio.dundee.ac.uk>:

Dear Anne.

On Tue Jun 12 16:12:04 2012, Ménard Anne wrote:

I create an account on http://issues.jalview.org with the username :
menard.

I've added you to the list of users that are allowed to push to the
jalview git repository. You should be able to check out the code with a
command like:

git clone https://menard@source.jalview.org/git/jalview.git

You should then create your own branch to work in off the development
branch

git checkout develop
git branch -b <your own branch name>

you'll be able to make this branch public by doing this:
git push --all

Check that you can see your branch by looking at the public repository :
http://source.jalview.org/gitweb/?p=jalview.git;a=heads

Not having succeeded in having access to the git sources, I used
google summer of code 2011 sources code but there was only the
sources and not the utils , lib, examples etc ..

please take a look at the latest code in the develop branch. I've done
some work to integrate the GSOC 2011 code into the core of Jalview.

jalview.datamodel.NCList<IntervalContainer> implements List
{
/**
  * find all intervals on (if start_inclusive is set), and before or
after start
  */
   public List<IntervalContainer> find(long start, boolean
start_inclusive, boolean before_or_after);

/**
  * find all intervals on or between start and end (according to
start_inclusive and end_inclusive)
  */
   public List<IntervalContainer> find(long start, boolean
start_inclusive, long end, boolean end_inclusive);
}

In fact I thought it would import a library (not available under java
so use JPython). Also, I thought it would take longer to encode.

I don't quite understand what you said here. Are you suggesting that
you import a library ? If you can find a native Java implementation
that is licensed in a way compatible with the GPL, then that would be
fine, but I don't think a JPython libary would be efficient for this.

simply a list of pairs that must be searched through each time.

I did not understand what that means but it might be a problem with
English.

What i mean is that currently, Jalview stores base pairs like:
Pairs: ((..))..((.(..)))
Jalview datamodel: (1,6),(2,5),(9,17),(10,16),(12,15)
So to find all pairs involving base 10, Jalview would need to check the
whole list. A nested containment list would mean the search would only
be made on base pairs involved in the region of base 10.

For tonight, I'll look at the file:
jalview.datamodel.AlignmentAnnotation and write the algorithm and data
structure of containment nescent list as Yann Ponty advise me .

OK.

Can I disturb you again tomorrow if I have more questions?

Sure.

I think we should await the return of Yann to talk on skype.

OK. I'll wait until tomorrow to talk.

Jim.

Hello Anne.

Good evening everyone,
I made ​​good progress in the parser code RNAML this afternoon, so, I
did not ask questions by email: D

ace :slight_smile:

I create a branch on the git my name but not the right place. I'll
take care of delete and recreate. This is not the right version
either, because I don't know how do to export a gitproject on eclipse .

ok - these things are often done more easily from the command line, but Egit is catching up. You should be able do a fast-forward merge from 'develop' onto your branch to bring it up to date. You'll need to do this periodically to pull in changes as I update the development branch. It's also possible to do it from the command line (git checkout menard; git merge --no-ff develop).

However, I have a question. We discussed with Yann about the
possibility to predict the file rnamlfor two cases :

1) the user already has the sequence and he just wants to retrieve the
annotations in the rnaml file (open two files and then juxtapose the
information)
2) the user doesn't have the sequence and so we recovered in the
rnamlfile ; the sequence and annotation.

Is it possible ? essentially for the first case.

both are possible. The second case is accomplished by supporting the basic jalview.io.AlignFile methods. The first case is accomplished by allowing RNAML files to be loaded via the 'Structure->Associate Structure with Sequence->From File' in the sequence ID popup menu (right click on an ID to get this).

The action is implemented in the jalview.gui.PopupMenu class, and the logic implemented in the jalview.gui.AssociatePdbFileWithSeq class. The simplest approach is to modify this class to detect if the file is an RNAML file rather than a PDB file, and if so, read the file and transfer the RNA structure to the destination sequence.

Jim.

···

On Wed Jun 20 17:41:03 2012, Ménard Anne wrote:

Hi Anne.

I forgot to send to Yann.
I put the link from the website created :
https://sites.google.com/site/jalview2012/
It is empty for the moment .

I originally thought you would use the NESCent wiki. Will you still use the wiki - or will you blog your progress at this site, instead ?

I begin to look the "first case" that is say modified :
jalview.gui.AssociatePdbFileWithSeq class in order to add the
annotations in the sequence .
Next, I look for show the data in the frame Jalview and not in the
console .
and Finally , I would recover the same information but when the user
provides a pdb file (with the help of Annote3D)

great plan!

I have not fallen too far behind for Tori for the integration of
tertiary annotations?
I would not prevent the advance.

Do not worry about this. You are on schedule at the moment.

All right, I can see all the elements in the console.On Varna , the
sequence and the annotations are stored in an ArrayList named results
but I did not understand where the results were stored for RfamFile
and StockolmFile because the method parse () returns void.
In stockolmFile , the data is stored in a or several Hashtable(s) but
How is it sent to be displayed on the screen?

Jalview uses a variant* on the 'Builder' design pattern (Builder) to instanstiate an alignment from a data source. The actual datasource is specified via a FileParse object (handles low level IO error handling, multi-pass and multi-part streams), and the abstract architecture for the alignment data provider is specified by jalview.io.AlignFile. With a few exceptions, concrete file formats are implemented by classes extending from AlignFile (mostly).
[ * - it is a builder pattern, but only just :wink: ]

Please look at jalview.io.AppletFormatAdapter.readFile class to see how Jalview builds an annotated jalview.datamodel.Alignment object from these classes. (and please ignore the embarrassing code duplication between jalview.io.AppletFormatAdapter.readFile and jalview.io.AppletFormatAdapter.readFromFile - these methods, should be merged so the same logic is used in both methods).

Jim.

···

On Thu Jun 21 12:57:25 2012, Ménard Anne wrote: