distance initialized to zero in average seq id tree?

Dear Jalview experts,

I recently tried to draw a tree from an MSA in which some of the sequences were completely unaligned (e.g. a sequence fragment that aligned only to the N-terminal region and one that aligned only to the C-terminal region). Oddly, these two sequences were placed together on the (average distance % id) tree, even though their distance would be undefined. This suggests that there is a bug in the Jalview tree code that (perhaps) initializes the distance to zero. Attached is the MSA that cases the problem. Two of the sequences that should not be close together are 1w0tA and MOUSEAkirin2.

Best regards,

Daron

mafftash1.fa (3.5 KB)

Daron Standley wrote:

I recently tried to draw a tree from an MSA in which some of the
sequences were completely unaligned (e.g. a sequence fragment that
aligned only to the N-terminal region and one that aligned only to the
C-terminal region). Oddly, these two sequences were placed together on
the (average distance % id) tree, even though their distance would be
undefined. This suggests that there is a bug in the Jalview tree code
that (perhaps) initializes the distance to zero. Attached is the MSA
that cases the problem. Two of the sequences that should not be close
together are 1w0tA and MOUSEAkirin2.

ah. yes. it does look like something strange is going on there - this is
a good test case. I verified jalview's UPGMA implementation some time
ago, and it looks like BLOSUM62 behaves as expected, so it's probably
some strangeness with the PID distance function when two sequences are
completly unaligned. I'll add it to the bugtrack.

Jim.