linkUrl and regexp

G'day Sebastien.

Sebastien MORETTI wrote:

With JalView 2.4, we try to extract an identifier from gene description
to make an external link.

Does it work with sequence description ?

no it doesn't, I'm afraid. When I put the regex feature in, I tried to
keep the behaviour similar to 2.3 to avoid 'unexpected' additional URLs
being generated.

If the URL generation is extended so it operates on the description line
then static URL links result in the whole description line being
substituted for $SEQUENCE_ID$ - so a sequence header like

FER1_SPIOL/1-147 Ferredoxin-1, chloroplast precursor

generates two SRS links - one for FER1_SPIOL and the other for
"Ferredoxin-1, chloroplast precursor".

I didn't want to clutter the existing URL link menu too much, so I
didn't include this additional URL generation step, but the
implementation is trivial (and I'll leave it in the development code,
but commented out). Does anyone else have an opinion on whether this
behaviour should be included by default ?

If it does, how to extract this kind of pattern:
GENEID=ENSPTRG00000030533 TAXID=9598

I have tried this syntax, but nothing appends.
$SEQUENCE_ID=/GENEID=(\w+) /=$

The easiest way to do generate these links without changing the existing
URL link behaviour would be to create a features file with URL links
embedded in the description line - these will be shown in the link menu
when the user right clicks on the sequence ID.

Jim

G'day Sebastien.

Hi Jim,

With JalView 2.4, we try to extract an identifier from gene description
to make an external link.

Does it work with sequence description ?

no it doesn't, I'm afraid. When I put the regex feature in, I tried to
keep the behaviour similar to 2.3 to avoid 'unexpected' additional URLs
being generated.

If the URL generation is extended so it operates on the description line
then static URL links result in the whole description line being
substituted for $SEQUENCE_ID$ - so a sequence header like

FER1_SPIOL/1-147 Ferredoxin-1, chloroplast precursor

generates two SRS links - one for FER1_SPIOL and the other for
"Ferredoxin-1, chloroplast precursor".

I didn't want to clutter the existing URL link menu too much, so I
didn't include this additional URL generation step, but the
implementation is trivial (and I'll leave it in the development code,
but commented out). Does anyone else have an opinion on whether this
behaviour should be included by default ?

If it does, how to extract this kind of pattern:
GENEID=ENSPTRG00000030533 TAXID=9598

I have tried this syntax, but nothing appends.
$SEQUENCE_ID=/GENEID=(\w+) /=$

The easiest way to do generate these links without changing the existing
URL link behaviour would be to create a features file with URL links
embedded in the description line - these will be shown in the link menu
when the user right clicks on the sequence ID.

I think I will try this.
Should I use this kind of syntax:
http://… seq_ID

With URL as description, a tab, sequence identifier ?

Or is there a featureType keyword to change URL description to link to add ?

Thanks

Jim
_______________________________________________
Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

Sébastien

···

--
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4056

Sebastien MORETTI wrote:

The easiest way to do generate these links without changing the existing
URL link behaviour would be to create a features file with URL links
embedded in the description line - these will be shown in the link menu
when the user right clicks on the sequence ID.

I think I will try this.
Should I use this kind of syntax:
http://… seq_ID

With URL as description, a tab, sequence identifier ?

not quite...

Here's an example from http://www.jalview.org/examples/exampleFeatures.txt :

<html>Fer2 Status: True Positive <a
href="http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00111&quot;&gt;Pfam
8_8</a></html> FER_CAPAA -1 8 83 Pfam
Ferredoxin_fold Status: True Positive FER_CAPAA -1 3 93 Cath

Specify start and end as -1 in order to add a non-positional feature
Jalview will recognise the anchors embedded in the description and parse
them as URL links where the link text is used for the menu name.

I see that this process is not described in the features file
documentation (
http://www.jalview.org/help/html/features/featuresFormat.html ) -
that'll have to be fixed !

Jim

ps. With regard to 'magic FeatureTypes' - Jalview does not do anything
intelligent with feature types, currently (with the exception of the
disulphide_bond type), which is a big limitation. It is easy to imagine
that features like 'DBRef' or 'Reference' might be translated into
database accession links and citation lists, and may actually be
implemented in the future.

The easiest way to do generate these links without changing the existing
URL link behaviour would be to create a features file with URL links
embedded in the description line - these will be shown in the link menu
when the user right clicks on the sequence ID.

I think I will try this.
Should I use this kind of syntax:
http://… seq_ID

With URL as description, a tab, sequence identifier ?

not quite...

Here's an example from http://www.jalview.org/examples/exampleFeatures.txt :

<html>Fer2 Status: True Positive <a
href="http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00111&quot;&gt;Pfam
8_8</a></html> FER_CAPAA -1 8 83 Pfam
Ferredoxin_fold Status: True Positive FER_CAPAA -1 3 93 Cath

Specify start and end as -1 in order to add a non-positional feature
Jalview will recognise the anchors embedded in the description and parse
them as URL links where the link text is used for the menu name.

I see that this process is not described in the features file
documentation (
http://www.jalview.org/help/html/features/featuresFormat.html ) -
that'll have to be fixed !

Jim

ps. With regard to 'magic FeatureTypes' - Jalview does not do anything
intelligent with feature types, currently (with the exception of the
disulphide_bond type), which is a big limitation. It is easy to imagine
that features like 'DBRef' or 'Reference' might be translated into
database accession links and citation lists, and may actually be
implemented in the future.

I have tried this syntax but it fails with an error in the java console:
Sequence not found: <html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 DBRef

Feature lines look like this:
<html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 DBRef
<html><a href='ENSPTRG00000005372 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> HSP90B1_PANTR -1 -1 -1 DBRef

And sequence headers look like this:
>ENPL_HUMAN ID=ENST00000299767.4 GENEID=ENSG00000166598
>HSP90B1_PANTR ID=ENSPTRT00000009873.3 GENEID=ENSPTRG00000005372

So, it seems the features cannot be allocated to the proper sequences.
What should be the right sequence name in this case ?

Thanks

···

--
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4056

Sebastien MORETTI a écrit :

The easiest way to do generate these links without changing the existing
URL link behaviour would be to create a features file with URL links
embedded in the description line - these will be shown in the link menu
when the user right clicks on the sequence ID.

I think I will try this.
Should I use this kind of syntax:
http://… seq_ID

With URL as description, a tab, sequence identifier ?

not quite...

Here's an example from http://www.jalview.org/examples/exampleFeatures.txt :

<html>Fer2 Status: True Positive <a
href="http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00111&quot;&gt;Pfam
8_8</a></html> FER_CAPAA -1 8 83 Pfam
Ferredoxin_fold Status: True Positive FER_CAPAA -1 3 93 Cath

Specify start and end as -1 in order to add a non-positional feature
Jalview will recognise the anchors embedded in the description and parse
them as URL links where the link text is used for the menu name.

I see that this process is not described in the features file
documentation (
http://www.jalview.org/help/html/features/featuresFormat.html ) -
that'll have to be fixed !

Jim

ps. With regard to 'magic FeatureTypes' - Jalview does not do anything
intelligent with feature types, currently (with the exception of the
disulphide_bond type), which is a big limitation. It is easy to imagine
that features like 'DBRef' or 'Reference' might be translated into
database accession links and citation lists, and may actually be
implemented in the future.

I have tried this syntax but it fails with an error in the java console:
Sequence not found: <html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 DBRef

Feature lines look like this:
<html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 DBRef
<html><a href='ENSPTRG00000005372 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> HSP90B1_PANTR -1 -1 -1 DBRef

And sequence headers look like this:
>ENPL_HUMAN ID=ENST00000299767.4 GENEID=ENSG00000166598
>HSP90B1_PANTR ID=ENSPTRT00000009873.3 GENEID=ENSPTRG00000005372

So, it seems the features cannot be allocated to the proper sequences.
What should be the right sequence name in this case ?

Thanks

"Sequence not found" message came from a forgotten tab. It is resolved now, sorry.

But I do not have links yet.

Now, feature file looks like this:
ensembl red
<html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 ensembl
<html><a href='ENSPTRG00000005372 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> HSP90B1_PANTR -1 -1 -1 ensembl
...

And sequence headers look like this:
>ENPL_HUMAN ID=ENST00000299767.4 GENEID=ENSG00000166598
>HSP90B1_PANTR ID=ENSPTRT00000009873.3 GENEID=ENSPTRG00000005372

I have nothing but SRS link in the links menu.

···

--
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4056

Sebastien MORETTI a écrit :

Sebastien MORETTI a écrit :

The easiest way to do generate these links without changing the existing
URL link behaviour would be to create a features file with URL links
embedded in the description line - these will be shown in the link menu
when the user right clicks on the sequence ID.

I think I will try this.
Should I use this kind of syntax:
http://… seq_ID

With URL as description, a tab, sequence identifier ?

not quite...

Here's an example from http://www.jalview.org/examples/exampleFeatures.txt :

<html>Fer2 Status: True Positive <a
href="http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00111&quot;&gt;Pfam
8_8</a></html> FER_CAPAA -1 8 83 Pfam
Ferredoxin_fold Status: True Positive FER_CAPAA -1 3 93 Cath

Specify start and end as -1 in order to add a non-positional feature
Jalview will recognise the anchors embedded in the description and parse
them as URL links where the link text is used for the menu name.

I see that this process is not described in the features file
documentation (
http://www.jalview.org/help/html/features/featuresFormat.html ) -
that'll have to be fixed !

Jim

ps. With regard to 'magic FeatureTypes' - Jalview does not do anything
intelligent with feature types, currently (with the exception of the
disulphide_bond type), which is a big limitation. It is easy to imagine
that features like 'DBRef' or 'Reference' might be translated into
database accession links and citation lists, and may actually be
implemented in the future.

I have tried this syntax but it fails with an error in the java console:
Sequence not found: <html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 DBRef

Feature lines look like this:
<html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 DBRef
<html><a href='ENSPTRG00000005372 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> HSP90B1_PANTR -1 -1 -1 DBRef

And sequence headers look like this:
>ENPL_HUMAN ID=ENST00000299767.4 GENEID=ENSG00000166598
>HSP90B1_PANTR ID=ENSPTRT00000009873.3 GENEID=ENSPTRG00000005372

So, it seems the features cannot be allocated to the proper sequences.
What should be the right sequence name in this case ?

Thanks

"Sequence not found" message came from a forgotten tab. It is resolved now, sorry.

But I do not have links yet.

Now, feature file looks like this:
ensembl red
<html><a href='ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> ENPL_HUMAN -1 -1 -1 ensembl
<html><a href='ENSPTRG00000005372 - Search - Homo_sapiens - Ensembl genome browser 110’>ensembl</a></html> HSP90B1_PANTR -1 -1 -1 ensembl
...

And sequence headers look like this:
>ENPL_HUMAN ID=ENST00000299767.4 GENEID=ENSG00000166598
>HSP90B1_PANTR ID=ENSPTRT00000009873.3 GENEID=ENSPTRG00000005372

I have nothing but SRS link in the links menu.

Another point !
With reel start and end positions, links appear on sequences but not in link menu of the sequence name.

And href MUST have URL between double quotes because it does not work with simple quotes.

So now, how to have links in link menu of the sequence name ?
And / or how to get links on sequences without changing amino acid colours with the feature colour ?

Best regards
Sébastien

···

--
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4056

Hello Sebastien.

congratulations - you have uncovered more bugs in Jalview! As a result,
this is a lengthy reply... (but not as long as the three emails you have
already sent today - thanks for your patience)

Sebastien MORETTI wrote:

"Sequence not found" message came from a forgotten tab. It is resolved
now, sorry.

ah. OK. That error is usually raised when the parser cannot find enough
tab-separated fields from the feature file line (it's not a very
informative error message, I'm afraid..).

But I do not have links yet.

ahem - yes. about that.

Providing your feature file parses (ie it has tab characters separating
each field), then the problems are as follows:

1. As you have noticed, the html parser in jalview is _very_ basic. It
expects XML attributes to be quoted with double quotes ('"') not a
single quote (''') (also, please bear in mind that single quoted
attributes is non-standard XML and will not validate).

This means you should have a feature file with lines like:

<html><a
href="ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110;
ENPL_HUMAN -1 0 0 DBRef

Note I have used '0' in the start/end fields. This is because I was in
error in my previous reply:

2. Non-positional features are specified by putting a '0' in both the
start and end (not a '-1' as I said previously - sorry).

Finally... the real problem:

3. Link display for non-positional features appears to be broken...
So even if you got the features to be listed in the sequence ID tooltip,
no links would be generated from the URL links embedded in the sequence
features.

This successfully scuppers my original suggestion of using
non-positional sequence features to append URLs......... however... I
have checked in a bug fix for #3. Both the applet and application now
generate URL links from links embedded in non-positional features.
Furthermore, the applet now displays non-positional features within the
tooltip.

can you try it out ?
(http://www.compbio.dundee.ac.uk/~ws-dev1/jalview/latest)
Jim.

ps. As an alternative, I have also re-instated the 'generate URL from
sequence description' code too, but made it so that only URLs which
contain regexes are used to process the description. This is still not
ideal, but more appropriate than simply making a URL substitution using
the whole description string.

···

--
-------------------------------------------------------------------
J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

This successfully scuppers my original suggestion of using
non-positional sequence features to append URLs......... however... I
have checked in a bug fix for #3. Both the applet and application now
generate URL links from links embedded in non-positional features.
Furthermore, the applet now displays non-positional features within the
tooltip.

can you try it out ?
(http://www.compbio.dundee.ac.uk/~ws-dev1/jalview/latest)

as an addendum - this feature allows access to many more URLs extracted
from the database references and retrieved via DAS (use the 'fetch DB
references' function from webservices menu). This includes direct links
to uniprot, taxonomy links (although the Uniprot NEWT taxonomy URLs that
are retrieved appear not to be properly at the moment), and links to
DOI's for publications.

Jim.

···

--
-------------------------------------------------------------------
J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.

James Procter a écrit :

Hello Sebastien.

congratulations - you have uncovered more bugs in Jalview! As a result,
this is a lengthy reply... (but not as long as the three emails you have
already sent today - thanks for your patience)

Sebastien MORETTI wrote:

"Sequence not found" message came from a forgotten tab. It is resolved
now, sorry.

ah. OK. That error is usually raised when the parser cannot find enough
tab-separated fields from the feature file line (it's not a very
informative error message, I'm afraid..).

But I do not have links yet.

ahem - yes. about that.

Providing your feature file parses (ie it has tab characters separating
each field), then the problems are as follows:

1. As you have noticed, the html parser in jalview is _very_ basic. It
expects XML attributes to be quoted with double quotes ('"') not a
single quote (''') (also, please bear in mind that single quoted
attributes is non-standard XML and will not validate).

This means you should have a feature file with lines like:

<html><a
href="ENSG00000166598 - Search - Homo_sapiens - Ensembl genome browser 110;
ENPL_HUMAN -1 0 0 DBRef

Note I have used '0' in the start/end fields. This is because I was in
error in my previous reply:

2. Non-positional features are specified by putting a '0' in both the
start and end (not a '-1' as I said previously - sorry).

Finally... the real problem:

3. Link display for non-positional features appears to be broken...
So even if you got the features to be listed in the sequence ID tooltip,
no links would be generated from the URL links embedded in the sequence
features.

This successfully scuppers my original suggestion of using
non-positional sequence features to append URLs......... however... I
have checked in a bug fix for #3. Both the applet and application now
generate URL links from links embedded in non-positional features.
Furthermore, the applet now displays non-positional features within the
tooltip.

can you try it out ?
(http://www.compbio.dundee.ac.uk/~ws-dev1/jalview/latest)
Jim.

ps. As an alternative, I have also re-instated the 'generate URL from
sequence description' code too, but made it so that only URLs which
contain regexes are used to process the description. This is still not
ideal, but more appropriate than simply making a URL substitution using
the whole description string.

Many thanks !
Both strategies work, non-positional features within the tooltip & with linkURL.

I have kept linkLabel_1/linkUrl_1 only because I think it is easier for users and I have no feature file to create.
<param name="linkLabel_1" value="Ensembl" />
<param name="linkUrl_2" value="\\$SEQUENCE\_ID=/GEN... - Search - Homo_sapiens - Ensembl genome browser 110 TAXID/=\$" />

Maybe you have to specify in the documentation that regexp in $SEQUENCE_ID have to be back-slash protected following computer languages used to make applet parameters.

Thanks again.

p.s.: We will send you something for Christmas :wink:

···

--
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4056

Sebastien MORETTI wrote:

Many thanks !
Both strategies work, non-positional features within the tooltip & with
linkURL.

great!

I have kept linkLabel_1/linkUrl_1 only because I think it is easier for
users and I have no feature file to create.

Understandable, and I agree. Non-positional links now serve almost the
same purpose, but they do have the advantage of offloading the
additional sequence metadata into another file, rather than stuffing it
all into the description line.

<param name="linkLabel_1" value="Ensembl" />
<param name="linkUrl_2"
value="\\$SEQUENCE\_ID=/GEN... - Search - Homo_sapiens - Ensembl genome browser 110
TAXID/=\$" />

Maybe you have to specify in the documentation that regexp in
$SEQUENCE_ID have to be back-slash protected following computer
languages used to make applet parameters.

Normally, this is not necessary. Are you talking about when the applet
tag is generated within a Perl CGI script ? (then all $ within double
quoted strings must be escaped)

If you escape the '$' and set the debug flag (param name="debug"
value="true") then jalviewLite raises an error because it cannot parse
the value of the linkURL_2 as a valid sequence ID regex link (in this
case it is because the /=$ cannot be found).

p.s.: We will send you something for Christmas :wink:

I'll look forward to it!

thanks for your patience, again...
Jim

Many thanks !
Both strategies work, non-positional features within the tooltip & with
linkURL.

great!

I have kept linkLabel_1/linkUrl_1 only because I think it is easier for
users and I have no feature file to create.

Understandable, and I agree. Non-positional links now serve almost the
same purpose, but they do have the advantage of offloading the
additional sequence metadata into another file, rather than stuffing it
all into the description line.

<param name="linkLabel_1" value="Ensembl" />
<param name="linkUrl_2"
value="\\$SEQUENCE\_ID=/GEN... - Search - Homo_sapiens - Ensembl genome browser 110
TAXID/=\$" />

Maybe you have to specify in the documentation that regexp in
$SEQUENCE_ID have to be back-slash protected following computer
languages used to make applet parameters.

Normally, this is not necessary. Are you talking about when the applet
tag is generated within a Perl CGI script ? (then all $ within double
quoted strings must be escaped)

Yes, the applet tags are generated within a Perl CGI script.
It should be the same for applet tags generated by Java code.

If you escape the '$' and set the debug flag (param name="debug"
value="true") then jalviewLite raises an error because it cannot parse
the value of the linkURL_2 as a valid sequence ID regex link (in this
case it is because the /=$ cannot be found).

Thus, I have escaped $ surrounding SEQUENCE_ID and every backslashed metacharacter in the regexp like \w
\$SEQUENCE_ID=/GENEID=(\\w+) TAXID/=\$

This should not to be done for single quoted strings.

p.s.: We will send you something for Christmas :wink:

I'll look forward to it!

thanks for your patience, again...
Jim

Thanks again and again
Sébastien

···

--
Sébastien Moretti
Department of Ecology and Evolution,
Biophore, University of Lausanne,
CH-1015 Lausanne, Switzerland
Tel.: +41 (21) 692 4221/4056