Phylogenetic trees calculated for specified alignment zones

james_starlight · 5 November 2014 15:06

Dear JalView users!

I need to perform large-scale phylogenetic analysis of big dataset of GPCR sequences selecting as the input only residues involved in the ligand-binding site of those receptors (taken from the structural data) from the input multiple-sequence alignment. I wounder what method for phylogenetic trees calculation will be best (neighbourhood joining or pairs-distance calculations) for my task as well as how to make selection of the selecting residues properly (previously I’ve done it by cntrl-left click on the bottom of the alignment marking corresponded zone by red colour). On what additional details should I paid my attention during such calculations in case when I’m dealing with a very big number of sequences?

Thank you for the help,

James

romain_studer · 5 November 2014 15:59

Dear James,

For aligning the sequences, I would recommend MAFFT (L-INS-i) or Clustal-Omega:
http://mafft.cbrc.jp/alignment/software/
http://www.clustal.org/omega/

Then, you can use Jalview to select manually the ligand-binding domain in your alignment.

You can also use TrimAl to select only well aligned position that corresponding to phylogenetic signal.
http://trimal.cgenomics.org/

For producing the tree on very big alignments, I would recommend FastTree. It produces quite good results and is very easy to use:
http://www.microbesonline.org/fasttree/

There is also RAxML which is developed for big alignments:
http://sco.h-its.org/exelixis/web/software/raxml/index.html

Best regards,
Romain

···

On 05/11/2014 15:06, James Starlight wrote:

-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder

james_starlight · 5 November 2014 20:18

Hi Romain,

thank you very much for the explanation!
I’ve already used TrimAl as the part of the Phylemon2 server and found it very useful
Regarding calculating Trees in JalView using subset of the residues: as I noticed the tree are calculated just in case when the positions of ligand-contacting residues are marked by red color in the top, aren’t it? Is it possible in addition to check on what exactly subset trees has been calculated based on the output results?
Finally regarding accuracy of the calculation of the trees- what method should produce best results for the alignment consisted of several hundred of sequences?

James

···

2014-11-05 16:59 GMT+01:00 rstuder <rstuder@ebi.ac.uk>:

Dear James,

For aligning the sequences, I would recommend MAFFT (L-INS-i) or Clustal-Omega:
http://mafft.cbrc.jp/alignment/software/
http://www.clustal.org/omega/

Then, you can use Jalview to select manually the ligand-binding domain in your alignment.

You can also use TrimAl to select only well aligned position that corresponding to phylogenetic signal.
http://trimal.cgenomics.org/

For producing the tree on very big alignments, I would recommend FastTree. It produces quite good results and is very easy to use:
http://www.microbesonline.org/fasttree/

There is also RAxML which is developed for big alignments:
http://sco.h-its.org/exelixis/web/software/raxml/index.html

Best regards,
Romain

On 05/11/2014 15:06, James Starlight wrote:
Dear JalView users!

I need to perform large-scale phylogenetic analysis of big dataset of GPCR sequences selecting as the input only residues involved in the ligand-binding site of those receptors (taken from the structural data) from the input multiple-sequence alignment. I wounder what method for phylogenetic trees calculation will be best (neighbourhood joining or pairs-distance calculations) for my task as well as how to make selection of the selecting residues properly (previously I’ve done it by cntrl-left click on the bottom of the alignment marking corresponded zone by red colour). On what additional details should I paid my attention during such calculations in case when I’m dealing with a very big number of sequences?

Thank you for the help,

James
_______________________________________________
Jalview-discuss mailing list
[Jalview-discuss@jalview.org](mailto:Jalview-discuss@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss)
-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder 

james_starlight · 6 November 2014 09:09

also I’ll very thankful if someone suggest me some server for the simple comparison of 2 phylogenetic trees and cluster calculations

James

···

2014-11-05 21:18 GMT+01:00 James Starlight <jmsstarlight@gmail.com>:

Hi Romain,

thank you very much for the explanation!
I’ve already used TrimAl as the part of the Phylemon2 server and found it very useful
Regarding calculating Trees in JalView using subset of the residues: as I noticed the tree are calculated just in case when the positions of ligand-contacting residues are marked by red color in the top, aren’t it? Is it possible in addition to check on what exactly subset trees has been calculated based on the output results?
Finally regarding accuracy of the calculation of the trees- what method should produce best results for the alignment consisted of several hundred of sequences?

James

2014-11-05 16:59 GMT+01:00 rstuder <rstuder@ebi.ac.uk>:

Dear James,

For aligning the sequences, I would recommend MAFFT (L-INS-i) or Clustal-Omega:
http://mafft.cbrc.jp/alignment/software/
http://www.clustal.org/omega/

Then, you can use Jalview to select manually the ligand-binding domain in your alignment.

You can also use TrimAl to select only well aligned position that corresponding to phylogenetic signal.
http://trimal.cgenomics.org/

For producing the tree on very big alignments, I would recommend FastTree. It produces quite good results and is very easy to use:
http://www.microbesonline.org/fasttree/

There is also RAxML which is developed for big alignments:
http://sco.h-its.org/exelixis/web/software/raxml/index.html

Best regards,
Romain

On 05/11/2014 15:06, James Starlight wrote:
Dear JalView users!

I need to perform large-scale phylogenetic analysis of big dataset of GPCR sequences selecting as the input only residues involved in the ligand-binding site of those receptors (taken from the structural data) from the input multiple-sequence alignment. I wounder what method for phylogenetic trees calculation will be best (neighbourhood joining or pairs-distance calculations) for my task as well as how to make selection of the selecting residues properly (previously I’ve done it by cntrl-left click on the bottom of the alignment marking corresponded zone by red colour). On what additional details should I paid my attention during such calculations in case when I’m dealing with a very big number of sequences?

Thank you for the help,

James
_______________________________________________
Jalview-discuss mailing list
[Jalview-discuss@jalview.org](mailto:Jalview-discuss@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss)
-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder 

andres · 6 November 2014 09:24

Hello

Compare2trees is a possibility:

http://www.mas.ncl.ac.uk/~ntmwn/compare2trees

Andreas

···

From: jalview-discuss-bounces@jalview.org [mailto:jalview-discuss-bounces@jalview.org] On Behalf Of James Starlight
Sent: 06 November 2014 10:10
To: rstuder; jalview-discuss@jalview.org
Subject: Re: [Jalview-discuss] Phylogenetic trees calculated for specified alignment zones

also I'll very thankful if someone suggest me some server for the simple comparison of 2 phylogenetic trees and cluster calculations

James

romain_studer · 6 November 2014 09:35

Yes, trees in Jalview are calculated only based on marked positions.

For accuracy, I would not use phylogenetic tools from Jalview.

I would rather do the following:

Select the positions in Jalview.
Copy them.
Paste them as new alignment.
Save the alignment in a new file.

And then I use FastTree or RAxML (or even PhyML if you have access to good computer cluster).

Romain

···

On 05/11/2014 20:18, James Starlight wrote:

-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder

jalviewcrowdadmin · 6 November 2014 11:29

Hi James.

Romain and Andreas have provided some great suggestions. I thought it worth adding that when using Jalview with web sites and other tools - the most convenient way to prepare selections for input (e.g. to a tree calculation) is by creating a new view (Ctrl or Command + T), select all the columns you want to employ for calculation and use the ‘View->Hide->All but selected region’ to hide everything that you do not want to employ for the tree calculation (shift+CMD/CTRL + H). Creating a view minimises the number of additional windows you create containing the same alignment data, and the input data view can be given it’s own name and archived in a jalview project. Once you have some results from the other tools, you can then add them the view, and explore the alignment further via Jalview’s subfamily shading methods (or even the Sequence Harmony service since its designed to predict functional site variation based on a set of defined subgroups on the alignment).

If you have any problems exporting alignment data from Jalview or importing the trees back in to Jalview from RaxML or FastTree, send an email ! We also now have PHYLIP format input/export support in the development versions of Jalview which is useful when working with RaxML.

Jim.

PS. Just to expand on Romain’s comment about accuracy: Jalview’s tree algorithms are rigorous but relatively primitive, and not considered appropriate for general phylogenetic analysis tasks. Also, Jalview doesn’t provide support for model selection (picking the right model to calculate the intersequence distances from the alignment) or bootstrapping (identifying the statistically significant branches in the tree). FastTree and RaxML both employ heuristic maximum likelihood searches to produce an accurate tree more quickly and includes approximate support calculations.

···

On 06/11/2014 09:35, rstuder wrote:

On 05/11/2014 20:18, James Starlight wrote:

-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder

james_starlight · 13 November 2014 19:30

Thank you very much for the suggestions!
I’ll try to use another software to calculate the trees!

Indeed I’ve noticed that trees produced by Jalview often consist of the wrong distributions and grouping of sequences especially in case when it has been made for big number of sequences put in alignment. It’s also strange that I’ve obtained better trees in case of the faster method (distance averaging using Blosum matrix) in comparison to more accurate neighbor joining algorithm.

James

···

2014-11-06 12:29 GMT+01:00 Jim Procter <jprocter@compbio.dundee.ac.uk>:

Hi James.

Romain and Andreas have provided some great suggestions. I thought it worth adding that when using Jalview with web sites and other tools - the most convenient way to prepare selections for input (e.g. to a tree calculation) is by creating a new view (Ctrl or Command + T), select all the columns you want to employ for calculation and use the ‘View->Hide->All but selected region’ to hide everything that you do not want to employ for the tree calculation (shift+CMD/CTRL + H). Creating a view minimises the number of additional windows you create containing the same alignment data, and the input data view can be given it’s own name and archived in a jalview project. Once you have some results from the other tools, you can then add them the view, and explore the alignment further via Jalview’s subfamily shading methods (or even the Sequence Harmony service since its designed to predict functional site variation based on a set of defined subgroups on the alignment).

If you have any problems exporting alignment data from Jalview or importing the trees back in to Jalview from RaxML or FastTree, send an email ! We also now have PHYLIP format input/export support in the development versions of Jalview which is useful when working with RaxML.

Jim.

PS. Just to expand on Romain’s comment about accuracy: Jalview’s tree algorithms are rigorous but relatively primitive, and not considered appropriate for general phylogenetic analysis tasks. Also, Jalview doesn’t provide support for model selection (picking the right model to calculate the intersequence distances from the alignment) or bootstrapping (identifying the statistically significant branches in the tree). FastTree and RaxML both employ heuristic maximum likelihood searches to produce an accurate tree more quickly and includes approximate support calculations.

On 06/11/2014 09:35, rstuder wrote:
Yes, trees in Jalview are calculated only based on marked positions.

For accuracy, I would not use phylogenetic tools from Jalview.

I would rather do the following:

Select the positions in Jalview.

Copy them.

Paste them as new alignment.

Save the alignment in a new file.

And then I use FastTree or RAxML (or even PhyML if you have access to good computer cluster).

Romain

On 05/11/2014 20:18, James Starlight wrote:

Hi Romain,

thank you very much for the explanation!
I’ve already used TrimAl as the part of the Phylemon2 server and found it very useful
Regarding calculating Trees in JalView using subset of the residues: as I noticed the tree are calculated just in case when the positions of ligand-contacting residues are marked by red color in the top, aren’t it? Is it possible in addition to check on what exactly subset trees has been calculated based on the output results?
Finally regarding accuracy of the calculation of the trees- what method should produce best results for the alignment consisted of several hundred of sequences?

James
-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder 
_______________________________________________
Jalview-discuss mailing list
[Jalview-discuss@jalview.org](mailto:Jalview-discuss@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss)

2014-11-05 16:59 GMT+01:00 rstuder <rstuder@ebi.ac.uk>:

Dear James,

For aligning the sequences, I would recommend MAFFT (L-INS-i) or Clustal-Omega:
http://mafft.cbrc.jp/alignment/software/
http://www.clustal.org/omega/

Then, you can use Jalview to select manually the ligand-binding domain in your alignment.

You can also use TrimAl to select only well aligned position that corresponding to phylogenetic signal.
http://trimal.cgenomics.org/

For producing the tree on very big alignments, I would recommend FastTree. It produces quite good results and is very easy to use:
http://www.microbesonline.org/fasttree/

There is also RAxML which is developed for big alignments:
http://sco.h-its.org/exelixis/web/software/raxml/index.html

Best regards,
Romain

On 05/11/2014 15:06, James Starlight wrote:
Dear JalView users!

I need to perform large-scale phylogenetic analysis of big dataset of GPCR sequences selecting as the input only residues involved in the ligand-binding site of those receptors (taken from the structural data) from the input multiple-sequence alignment. I wounder what method for phylogenetic trees calculation will be best (neighbourhood joining or pairs-distance calculations) for my task as well as how to make selection of the selecting residues properly (previously I’ve done it by cntrl-left click on the bottom of the alignment marking corresponded zone by red colour). On what additional details should I paid my attention during such calculations in case when I’m dealing with a very big number of sequences?

Thank you for the help,

James
_______________________________________________
Jalview-discuss mailing list
[Jalview-discuss@jalview.org](mailto:Jalview-discuss@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss)
-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder 

james_starlight · 13 November 2014 21:17

Finally regarding FastTree utility I’ll be very thankful if someone provide me with the ideas of how to improve accuracy of the trees (e.g using bootstrap or thx else) produced by its.
On default I’m using fasttree -n 10 hOR_multipleAlignments.fa > tree_fast_allSEQ.ph
providing only flag for bootstrap in additional to standart input.

James

···

2014-11-13 20:30 GMT+01:00 James Starlight <jmsstarlight@gmail.com>:

Thank you very much for the suggestions!
I’ll try to use another software to calculate the trees!

Indeed I’ve noticed that trees produced by Jalview often consist of the wrong distributions and grouping of sequences especially in case when it has been made for big number of sequences put in alignment. It’s also strange that I’ve obtained better trees in case of the faster method (distance averaging using Blosum matrix) in comparison to more accurate neighbor joining algorithm.

James

2014-11-06 12:29 GMT+01:00 Jim Procter <jprocter@compbio.dundee.ac.uk>:

Hi James.

Romain and Andreas have provided some great suggestions. I thought it worth adding that when using Jalview with web sites and other tools - the most convenient way to prepare selections for input (e.g. to a tree calculation) is by creating a new view (Ctrl or Command + T), select all the columns you want to employ for calculation and use the ‘View->Hide->All but selected region’ to hide everything that you do not want to employ for the tree calculation (shift+CMD/CTRL + H). Creating a view minimises the number of additional windows you create containing the same alignment data, and the input data view can be given it’s own name and archived in a jalview project. Once you have some results from the other tools, you can then add them the view, and explore the alignment further via Jalview’s subfamily shading methods (or even the Sequence Harmony service since its designed to predict functional site variation based on a set of defined subgroups on the alignment).

If you have any problems exporting alignment data from Jalview or importing the trees back in to Jalview from RaxML or FastTree, send an email ! We also now have PHYLIP format input/export support in the development versions of Jalview which is useful when working with RaxML.

Jim.

PS. Just to expand on Romain’s comment about accuracy: Jalview’s tree algorithms are rigorous but relatively primitive, and not considered appropriate for general phylogenetic analysis tasks. Also, Jalview doesn’t provide support for model selection (picking the right model to calculate the intersequence distances from the alignment) or bootstrapping (identifying the statistically significant branches in the tree). FastTree and RaxML both employ heuristic maximum likelihood searches to produce an accurate tree more quickly and includes approximate support calculations.

On 06/11/2014 09:35, rstuder wrote:
Yes, trees in Jalview are calculated only based on marked positions.

For accuracy, I would not use phylogenetic tools from Jalview.

I would rather do the following:

Select the positions in Jalview.

Copy them.

Paste them as new alignment.

Save the alignment in a new file.

And then I use FastTree or RAxML (or even PhyML if you have access to good computer cluster).

Romain

On 05/11/2014 20:18, James Starlight wrote:

Hi Romain,

thank you very much for the explanation!
I’ve already used TrimAl as the part of the Phylemon2 server and found it very useful
Regarding calculating Trees in JalView using subset of the residues: as I noticed the tree are calculated just in case when the positions of ligand-contacting residues are marked by red color in the top, aren’t it? Is it possible in addition to check on what exactly subset trees has been calculated based on the output results?
Finally regarding accuracy of the calculation of the trees- what method should produce best results for the alignment consisted of several hundred of sequences?

James
-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder 
_______________________________________________
Jalview-discuss mailing list
[Jalview-discuss@jalview.org](mailto:Jalview-discuss@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss)

2014-11-05 16:59 GMT+01:00 rstuder <rstuder@ebi.ac.uk>:

Dear James,

For aligning the sequences, I would recommend MAFFT (L-INS-i) or Clustal-Omega:
http://mafft.cbrc.jp/alignment/software/
http://www.clustal.org/omega/

Then, you can use Jalview to select manually the ligand-binding domain in your alignment.

You can also use TrimAl to select only well aligned position that corresponding to phylogenetic signal.
http://trimal.cgenomics.org/

For producing the tree on very big alignments, I would recommend FastTree. It produces quite good results and is very easy to use:
http://www.microbesonline.org/fasttree/

There is also RAxML which is developed for big alignments:
http://sco.h-its.org/exelixis/web/software/raxml/index.html

Best regards,
Romain

On 05/11/2014 15:06, James Starlight wrote:
Dear JalView users!

I need to perform large-scale phylogenetic analysis of big dataset of GPCR sequences selecting as the input only residues involved in the ligand-binding site of those receptors (taken from the structural data) from the input multiple-sequence alignment. I wounder what method for phylogenetic trees calculation will be best (neighbourhood joining or pairs-distance calculations) for my task as well as how to make selection of the selecting residues properly (previously I’ve done it by cntrl-left click on the bottom of the alignment marking corresponded zone by red colour). On what additional details should I paid my attention during such calculations in case when I’m dealing with a very big number of sequences?

Thank you for the help,

James
_______________________________________________
Jalview-discuss mailing list
[Jalview-discuss@jalview.org](mailto:Jalview-discuss@jalview.org)
[http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss](http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss)
-- 
Romain Studer
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire
CB10 1SD, UK 
Tel: +44 (0)1223 492 547
Twitter: @RomainStuder