For Large sequence alignments 10,000s of sequences the PCA calculation seems to require a lot of memory but is not particularly CPU intensive and takes a number of days. I was wondering if there is a practical limit to the number of sequences for which a PCA can be calculated in Jalview? I was wondering if there is anyway to estimate the length of time the calculation could take?
Adrian
Hi Adrian.
You are absolutely correct that Jalview’s PCA calculation is quite inefficient. We haven’t really been focused on making it more memory efficient or faster because we always planned replace the in-app calculation with a web service, which is something we can now do more easily with the launch of slivka.
It would certainly be possible to perform some benchmarking - I would expect Jalview’s calculation to be fairly predictable in its performance barring issues due to the JVM’s garbage collector. When @morellthomas added PaSiMap they also implemented a progress bar/estimator which we could look at improving.
Please get in contact direct via my Dundee email if you’d like to explore other options..!
Jim