PCA

I use the PCA calculation within Jalview to analyze a variety of sequence motifs. I would like to report the percentage of points that fit to the principal components. I see eigen values in the output files, but is there a way to determine the percentage of points as a way to report the quality of the PCA?

Thanks,
Jessica

Hi Jessica.

I use the PCA calculation within Jalview to analyze a variety of
sequence motifs. I would like to report the percentage of points that
fit to the principal components. I see eigen values in the output
files, but is there a way to determine the percentage of points as a
way to report the quality of the PCA?

Jalview doesn't have the ability to use the PCA model to analyse the alignment - and that is exactly the reason I included the ability to output the original and eigenvector matrix so that you can further analyse the data in some other program.

If you have an idea of the kind of analysis you want to do, then it might be possible to find a student to implement it within Jalview. I guess the obvious thing might be to provide a scree plot and do an 'elbow analysis' to indicate the top N most informative eigenvectors in order to report the %age of variance covered by those N components. Going further, it sounds like you might want to build a PCA with a set of seed sequences and plot the positions of other aligned sequences in the PCA - which would also allow you to give some kind of goodness of fit for each point.

We've not heard yet if the google summer of code will run again this year.. but if it does, this would make a great project for a GSoC student !
Jim.

It sounds like you are asking about projecting sequences onto the PCA model in order to assess how they fit, which would be analogous to a common use for ordination based models where new observations are evaluated to see how well they fit some existing model.

···

On Fri Jan 25 21:50:48 2013, Jessica Richard wrote:

oops - looks like I'd got some xtra text there:

···

On Sat Jan 26 15:45:16 2013, Jim Procter wrote:

It sounds like you are asking about projecting sequences onto the PCA
model in order to assess how they fit, which would be analogous to a
common use for ordination based models where new observations are
evaluated to see how well they fit some existing model.

sorry 'bout that. Also - you might want to check out JDet (at JDet - Home page and Google Code Archive - Long-term storage for Google Code Project Hosting.) - since that also allows a residue level PCA to be performed on an alignment.. I'd be quite tempted to incorporate JDet's analysis and visualization components in Jalview, in fact.

Jim

Thanks for all of the suggestions Jim! I’m looking into this now.

Jessica

···

From: Jim Procter jprocter@compbio.dundee.ac.uk
To: jalview-discuss@jalview.org
Sent: Saturday, January 26, 2013 9:49 AM
Subject: Re: [Jalview-discuss] PCA

oops - looks like I’d got some xtra text there:

On Sat Jan 26 15:45:16 2013, Jim Procter wrote:

It sounds like you are asking about projecting sequences onto the PCA
model in order to assess how they fit, which would be analogous to a
common use for ordination based models where new observations are
evaluated to see how well they fit some existing model.

sorry 'bout that. Also - you might want to check out JDet (at
JDet - Home page and https://code.google.com/p/jdet/) -
since that also allows a residue level PCA to be performed on an
alignment… I’d be quite tempted to incorporate JDet’s analysis and
visualization components in Jalview, in fact.

Jim


Jalview-discuss mailing list
Jalview-discuss@jalview.org
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss