Hi Lauren.
The notes I made yesterday about Rfam retrieval URLs can be found at the end of the email. But first, the advice about refactoring the Pfam database fetcher.
Currently, jalview has a set of classes for Pfam retrieval that look like this:
jalview.ws.dbsources.Pfam - abstract class that contains the code to retrieve, parse and annotate the alignment retrieved from a URL source. Currently, it is hardwired to annotate each retrieved sequence in the alignment with the 'PFAM' database reference used to retrieve the alignment.
jalview.ws.dbsources.PfamFull : concrete class extending Pfam that provides the URL for retrieving the whole family
jalview.ws.dbsources.PfamSeed : concrete class extending Pfam for retrieving just the seed alignment
The PfamFull and PfamSeed classes are registered as database sources in the jalview.ws.SequenceFetcher() constructor - note that only a reference to the class is passed to the addDBRefSourceImpl function.
Adding an RfamSeed and RfamFull class:
The ideal route would be too re-use as much of the existing code in jalview.ws.dbsources.Pfam as possible. To do this, you need to generalise the jalview.ws.Pfam class. You can do this in a couple of ways, I'd suggest trying out eclipse's refactoring tool:
1. Select Pfam at the beginnning of the class definition in the jalview.ws.Pfam class.
2. In the refactoring menu, select 'Extract superclass'.
3. The dialog will let you enter a new class name (Xfam), and also allow you to select methods in the Pfam class that you want to either 'pull up' or define as abstract in the new superclass. All you need to pull up is the getSequenceRecords method, but you'll also need to define the getPFAMURL and getDbVersion() methods as abstract in Xfam - this is because they are mentioned in the getSequenceRecords method (try leaving either or both of them out and see what happens).
4. Once you've selected the methods that should be moved in to the new superclass, hit the next button, and you'll be shown various previews, and also told if any of the changes you make cause any errors. You can always go back to change the options you selected.
5. finally, cross your fingers and hit finish to do the refactoring.
The code will all run as before, since refactoring a class hierarchy modifies the structure without affecting the actual run-time behaviour of the code that is executed.
The new class structure looks like this:
Xfam <- Pfam
Pfam <- PfamFull
Pfam <- PfamSeed
(this is UML notation. The <- means 'superclass' from left to right, or 'extends from' when reading from right to left).
What you will then need to do is make three new classes:
Xfam <- Rfam
Rfam <- RfamFull
Rfam <- RfamSeed
However, this means that Xfam should contain only methods relevant to both Rfam and Pfam, and you'll notice that Xfam is still contaminated with a reference to the PFAM database accession label. The way to fix this is:
1. Introduce a new abstract method in Xfam, and replace the references to DBRefSource.PFAM in the getSequenceRecords :
abstract String getXfamSource();
2. Each specific Xfam source family should implement this, for example, in jalview.ws.dbsources.Pfam:
public String getXfamSource() { return jalview.datamodel.DBRefSource.PFAM; }
will define the correct parent reference for all the PFAM family sources.
Ok. It sounds long winded, but that's because I've spelled out why you need to do each operation. It actually took me less than five minutes to refactor and add in the new abstract method, as opposed to a lot of manual copy and pasting, find/replaces and renaming of files. Note - I've not had to touch the PfamFull or PfamSeed classes at all. In principle, all you'll need to do is create new classes for the abstract class Rfam, and then the concrete classes RfamFull and RfamSeed, using the eclipse 'New class' wizard, and then fill in the method code specific to each class.
==== Retrieving Rfam families via a Rest web service
Base url for html is: http://rfam.janelia.org/
This is the form from janelia that allows you to download the alignment:
<FORM method="POST" action="/cgi-bin/getalignment">
<INPUT type="radio" name="type" value="seed" CHECKED> Seed (5 sequences)<br />
<INPUT type="radio" name="type" value="full"> Full (45 sequences)
<br />
Format:
<SELECT name="fmt" size=1>
<OPTION value="stockholm" SELECTED>Stockholm
<OPTION value="text">Plain text
<OPTION value="jalview">Jalview java viewer
<OPTION value="msf">GCG MSF format
<OPTION value="afasta">Aligned FASTA format
</SELECT>
<br />
<INPUT type="hidden" name="name" value="DsrA">
<INPUT type="submit" value="Retrieve alignment">
</FORM>
{ using the http://en.wikipedia.org/wiki/Common_Gateway_Interface URL 'GET' format to assemble a QUERY_STRING }
the angle brackets denote places where the class needs to provide info on the query:
http://rfam.janelia.org/cgi-bin/getalignment?type=<seed|full>&fmt=stockholm&name=<familyname>
···
=====
ok. the groovy scripts will come in another email, and I'll talk to you Thursday am!
Jim.
--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.