Thanks for the advice. I don't think that BioPython has a problem, I've found it very useful to go back and forth between tab delimited and fasta formats when necessary. I should stress that I only use it as a parser, all other functions are performed elsewhere. The only minor inconvenience I've found is that it sometimes has problems with long pathlengths.
Sadly, I don't code. I maintain a rather large flatfile database of annotated sequences, and it is necessary to have the sequence in a single field. My problem is that occasionally I want to re-upload manipulated sequences to that database. When I do this, I have to do it with multiple (still quite large) subsets of data, so I'm always looking to loose a step. If the fasta file contains line feeds, each line of a sequence becomes a new record; and I can't get that program to ignore line feeds.
I've just found that if you open the alignment in MEGA 5 and export as a FASTA (*.mas) file you can eliminate the line feeds.
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Jim Procter
Sent: Wednesday, January 26, 2011 12:26 PM
Subject: Re: [Jalview-discuss] Fwd: Non - Delimited FASTA output
Hi Jared, Peter [and Peter!]
I've cc'ed this to Peter Cock, who maintains biopython, because - IMHO, it sounds like there is a problem with biopython's parser (see e.g. http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml).
As for workarounds, Jalview doesn't have a switch to prevent the pretty-printing of FASTA files, and I'm afraid I'm not convinced its worth it. However, if you are actually coding, you could follow Peter Troshin's suggestion, and use :
On 26/01/2011 16:23, Peter Troshin wrote:
compbio.data.sequence.FastaSequence.getOnelineFasta() does that.
This java class can be found in the min-jaba-client.jar in Jalview's lib directory, or downloaded from the JABA web site : http://www.compbio.dundee.ac.uk/jabaw
However, I suspect you aren't actually coding in Java, in which case, the easiest would be to :
1. Use a different output format from Jalview to pass the data to biopython.
2. Pass it through a tool like EMBOSS's seqret (http://emboss.sourceforge.net/docs/themes/SequenceFormats.html#change) to normalise it.
3. Pipe the file through a script to remove the newlines
Hope that helps!
-------- Original Message --------
[Jalview-discuss] Non - Delimited FASTA output
Wed, 26 Jan 2011 11:10:52 -0500
Ackers, Jared (DEWN)
Is there a way to output FASTA files of an alignment that does not contain a line-feed character, i.e., one where the entire sequence is on one line? BioPython will parse Line-feed containing FASTA to tab delimited, but I was hoping to circumvent this.
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.
CAUTION: This message was sent via the Public Internet and its authenticity cannot be guaranteed.
PROPRIETARY: This e-mail contains proprietary information some or all of which may be legally privileged. It is intended for the recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the authority by replying to this e-mail. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this e-mail.