GenBankParser (JAL-1260)

Hi all,

I’ve just finished at GenBankParser (JAL-1260), though I need help to map the parsed info into Jalview datamodel. Can anybody point me to documentation?

Thanks so much!

Cheers,

David

Hi all,

···

Thanks David - see the issue for my comments.

Jim.

On 14/12/2013 09:48, David Roldán Martínez wrote:

Hi,

I’ve seen the comments but I’m afraid my doubts is much more basic. I don’t know what do the following lines mean:
1 gatcctccat atacaacggt atctccacct caggtttaga tcaacaac ggaaccattg
61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
[…]

What I see here is a number (that I guess related with the number of bases cointained in each line: 60 bases/line) and 6 bases string that I don’t know if are sub-sequences o if they form part of the same sequence.

I’ve taken a look at embl_mapping.xml and I think I understand how the information is mapped but I still don’t know how this applies to GenBank files as the one I’m attaching.

BTW, do you want me to incluide these comments on the issue or do you prefer to use the list for discussion?

Cheers,

David

genbank_sample.gb (10.3 KB)

···

2013/12/16 Jim Procter <jprocter@compbio.dundee.ac.uk>

Thanks David - see the issue for my comments.

Jim.

On 14/12/2013 09:48, David Roldán Martínez wrote:

Hi all,

I’ve just finished at GenBankParser (JAL-1260), though I need help to map the parsed info into Jalview datamodel. Can anybody point me to documentation?

Thanks so much!

Cheers,

David

_______________________________________________
Jalview-dev mailing list
[Jalview-dev@jalview.org](mailto:Jalview-dev@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev)


Jalview-dev mailing list
Jalview-dev@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev

Hi David.

I've seen the comments but I'm afraid my doubts is much more basic.

ah :slight_smile:

don't know what do the following lines mean:
1 gatcctccat atacaacggt atctccacct caggtttaga tcaacaac ggaaccattg
61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct
121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa
[...]

What I see here is a number (that I guess related with the number of
bases cointained in each line: 60 bases/line) and 6 bases string that
I don't know if are sub-sequences o if they form part of the same
sequence.

They all form part of the same sequence. Genbank records typically describe just one contiguous nucleotide sequence, along with lots of annotation (including genes/protein(s) that might be present, etc).

I've taken a look at embl_mapping.xml and I think I understand how the
information is mapped but I still don't know how this applies to
GenBank files as the one I'm attaching.

BTW, do you want me to incluide these comments on the issue or do you
prefer to use the list for discussion?

Either comment on the bug or perhaps just email me directly - since your questions are more about understanding the semantics of the format. In fact, I think that it would be worth us skyping sometime this week. Would you be able to talk around 5-6pm your time tomorrow ? I'll have actually looked at your patch by then and will be able to give you some feedback.

Jim.

···

On Mon Dec 16 22:22:07 2013, David Roldán Martínez wrote: