Hi,
I your manual about annotation files you describe:
http://www.jalview.org/help/oldhelp/html/features/annotationsFormat.html
…
You can associate an annotation with a sequence by preceding its definition with the line:
SEQUENCE_REF seq_name [startIndex]
…
I wonder what the exact format of seq_name is:
Image I get a fasta file like this:
db>183474|my_pet_protein
Do I have to put in the full id or are other variations ok?
SEQUENCE_REF db|183474|my_pet_protein 1
SEQUENCE_REF 183474 1
SEQUENCE_REF my_pet_protein 1
Background: Since most often accession numbers don’t tell you the species name, I would like to add the species info to the sequence name to quickly spot the organism. e.g. my_pet_protein|Escherichia_coli. But then, I would need to change the annotation file seq_name if I can’t use a shorthand…
Thanks
Steffen
Hi Steffen - thanks for your mail!
Steffen Schmidt wrote:
I your manual about annotation files you describe:
http://www.jalview.org/help/oldhelp/html/features/annotationsFormat.html
...
You can associate an annotation with a sequence by preceding its definition with the line:
SEQUENCE_REFseq_name[startIndex]
...
I wonder what the exact format of seq_name is:
Image I get a fasta file like this:
db>183474|my_pet_protein
Do I have to put in the full id or are other variations ok?
SEQUENCE_REFdb|183474|my_pet_protein1
SEQUENCE_REF1834741
SEQUENCE_REFmy_pet_protein1
Background: Since most often accession numbers don’t tell you the species name, I would like to add the species info to the sequence name to quickly spot the organism. e.g. my_pet_protein|Escherichia_coli. But then, I would need to change the annotation file seq_name if I can’t use a shorthand…
Jalview's annotation file format works on exact string matches to associate tracks with a sequence. We made that decision because the format was designed to be a way for other programs to generate data for import in to Jalview.
It is reasonably straightforward to allow substring based matching like you suggest - Jalview does that for Newick tree import already, so the function is available - so I can create a patch right away, if you like. I've created a new feature request for this at http://issues.jalview.org/browse/JAL-1427
However, there might be some backwards compatibility problems in the case where an alignment includes different sequences where one sequence's ID is wholly contained in another, so I don't think I can make substring matching the default behaviour when parsing the SEQUENCE_REF tag in annotation files. Any thoughts ?
Jim.