PCA of MSA data

Dear Jalview users!

I’m faced with the problem during the analysis of the MSA data obtained after its processing using PCA method due to very big number of sequences included in the analysis. I’d be very thankful if someone provide me with ideas of how such analysis could be performed better in such case as well as what additional tools could be used to obtain some statistical values from PCA of MSA data (e.g variance along PC and % of sequence conservation etc).

I’d thankful for any suggestions,

James

Hi James,

I'm faced with the problem during the analysis of the MSA data obtained after its processing using PCA method due to very big number of sequences included in the analysis.

Unfortunately, Jalview's PCA function is very memory hungry. So, in the first instance - you should make sure you have maximised the memory available to Jalview. I'm actually going to be working on the PCA mechanism in the next few weeks so if you'd like to give me an idea of the size of PCA you'd like to perform, then I'll create a test case to see if the method can be made more efficient.

I'd be very thankful if someone provide me with ideas of how such analysis could be performed better in such case as well as what additional tools could be used to obtain some statistical values from PCA of MSA data (e.g variance along PC and % of sequence conservation etc).

Ideally, Jalview would provide scree plots to indicate how much variance is explained by each ordinate, but for the moment you can access that in the text dumps from the File menu. Alternately, if you can generate a sequence similarity matrix with Jalview, then you could import and analyse it in R. There are also a handful of libraries (including bioconductor) that provide functions for calculating similarity matrices from MSAs.

Of course - you could also try JDet. this provides a more detailed PCA analysis and is oriented towards sequence-structure analysis of proteins - take a look here:
https://code.google.com/p/jdet/

Let us know how you get on !
Jim.

···

On 15/01/2015 15:57, James Starlight wrote: