cleaning up protein sequences

Suppose I've created a protein alignment based on a few hundred
sequences from a PSI-BLAST query. Many of the sequences in this
alignment will contain extra junk regions or, conversely, omitted
regions. I want to clean up these sequences so I can re-align them and
analyze them further.

When I'm reviewing my alignment in Jalview, I'd like to change the
junk regions into gaps. I don't want to just delete the junk with the
backspace key, because that would mess up the alignment and make my
review harder.

Hi Jeremy,

This can be done in STRAP. Afterwards you can export the alignment back to Jalview.
STRAP has the export option to start a jalview session with selected sequences.
That means that changing between the two programs is easy.

Follow the steps, it wont that easy if you are not familiar with STRAP yet

1 Drag and drop your sequence files into STRAP alignment program.

2 Then take the core part of Blast query and create a new sequence file with a text editor. Drag this file into Strap.

3 Strap-menubar>Align>Align many sequences to same reference sequence
  Take the truncated blast query as reference sequence.

4. Chose ClustalW, Press Go button.

5. You get a result tab. For each sequence, the position of the reference is marked. In the result tab you see a button at the top "Process matches ... "
   Click it.

You get something like

   a2_ArabidopsisThaliana.swiss a2_ArabidopsisThaliana.swiss!22-47
a2_CaenorhabditisElegans.swiss a2_CaenorhabditisElegans.swiss!21-46
    ....
a2_SaccharomycesCerevisiae.dssp a2_SaccharomycesCerevisiae.dssp!21-46
         a2_XenopusLaevis.swiss a2_XenopusLaevis.swiss!22-47

This is a table which tells for example file "a2_XenopusLaevis.swiss" is renamed to "a2_XenopusLaevis.swiss!22-47"

The suffix after the exclamation marks tell the residue position range.
With such "!" suffix in Strap a protein is truncated at the given sequence positions.
You need to copy this text into the rename dialog to rename the sequences.

This generated text needs to be copied into the "Rename proteins"-dialog.
After renaming, the sequences are narrowed to the given positions.

There is one limitation: You need to
truncate the blast query yourself. If you would like STRAP to
find the relevant part from the full length blast query, then I
would need to improve things.

Finally, you can send the alignment back to Jalview and continue
in your familiar Jalview environment.

I hope that this is helpfull
Cheers Christoph