Explanation of Weighted LCA settings - readMagnitude

katie · June 20, 2017, 10:35pm

Hello,

I’m trying out the MEGAN6 Weighted LCA algorithm for the first time after previously using MEGAN5’s ‘LCA of an X percent’. We currently cluster our reads into OTU sequences which we then BLAST and analyze through MEGAN. Each OTU sequence then corresponds to a certain number of reads in the dataset. To use the weighted LCA algorithm, would you recommend taking these read numbers into account perhaps by setting Read Assignment Mode to ‘readMagnitude’ and using the correct “weight=XX” terminology in our OTU headers? Currently I’m a little concerned we are getting misalignments (see below) where OTUs have ‘good’ hits to multiple species. In the below case MEGAN5 would annotate to G. parvulum while MEGAN6 with wLCA annotates to ‘A. sp. AORF-2015’. Within MEGAN6 there is one other OTU in this dataset that is annotated to G. parvulum while there are 6 OTUs annotated to ‘A. sp. AORF-2015’ so I’m assuming that is why MEGAN6_wlCA annotates to ‘A. sp. AORF-2015’?

I didn’t see mention of how the ‘readMagnitude’ parameter affects LCA placement in the MEGAN6 manual, if there is a further description online please point me to it. I’m assuming it would then weigh a certain OTU sequence as that many reads when calculating read placement.

I’m also curious as to your opinion on how this affects the accuracy of taxon assignment - should it matter more in a presence/absence manner if a species is detected by a unique sequence (and more unique OTUs being assigned to a taxon makes it more likely to be truly present) or should the abundance of each unique sequence be taken into account - so one very abundant unique sequence would make it more likely other sequences would be annotated as that taxon. Please let me know if I’m misunderstanding how the weighted LCA works.

Thank you!
Katie

Daniel · June 24, 2017, 9:25am

Hi Katie,

I just took a look at the implementation of the weighted LCA in MEGAN6.
The current implementation does not take magnitudes into account.
I have now modified the code so that it does take magnitudes (or what read assignment mode is chosen) into account.
This has been tested as follows:
Read1 has magnitude 100 and has one alignment to species A
Read2 has magnitude 1 and has one alignment to species B
Read3 has magnitude 1 and has one alignment to A and one alignment to B
The weighted-LCA assigns 1 to A, 2 to B and 3 to A. The naive LCA would assign 3 to lca(A,B)

The idea of the weighted LCA is to weight references by the number of reads that uniquely align to the taxon associated with the given reference. So yes, the weighted LCA should use OTU abundances into account.

This update will be available later today.