Which LCA numbers are which?

I am a new user to MEGAN6. I am using Windows version currently, although I can switch to linux if required.

I am doing taxonomic analysis, and I am going to publish the results, so I need to know exactly what I am reporting.

MEGAN6 publication (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004957) mentions several LCA algorithms, standard (or naive) LCA, weighted LCA, and the projection LCA.

I imported my data and ran taxonomic analysis only.
At first, I ran only the standard LCA analysis. In the tree view, I can see the numbers assigned and the sum of all the terms below at any given taxon. Can you confirm that that these numbers are the naive LCA?

When I click on RANK -> Genus button, I then go to File->Export->CSV, I select “taxonName_to_count”, “summarized”, and “tab”, the file I generate contains the assigned counts, not the summarized counts. Am I doing something wrong or is this an error?

I can repeat the analysis and check “use weighted LCA.” I then have a parameter which defaults to 0.80, which is not described in the manual. What does this term do?
Is the sum now listed in the taxonomy tree the weighted LCA?

How do I trigger a projection calculation? Where do I find the scores?

At first, I ran only the standard LCA analysis. In the tree view, I can see the numbers assigned and the sum of all the terms below at any given taxon. Can you confirm that that these numbers are the naive LCA?

Yes, these numbers are computed by the naive LCA

When I click on RANK → Genus button, I then go to File->Export->CSV, I select “taxonName_to_count”, “summarized”, and “tab”, the file I generate contains the assigned counts, not the summarized counts. Am I doing something wrong or is this an error?

Thanks, this IS a bug. I have identified it and fixed it and the fix will be available with the next release (by Monday or Tuesday)

I can repeat the analysis and check “use weighted LCA.” I then have a parameter which defaults to 0.80, which is not described in the manual. What does this term do?
Is the sum now listed in the taxonomy tree the weighted LCA?

The weighted LCA operates by performing two rounds of analysis.
First:
Each reference sequence is assigned a weight of 1.
For each read r, if all its significant alignments (i.e. those that exceed the given minScore threshold and are within the given topPercent of the best alignment for that read) are to reference sequences labeled by the same species, then the weight of all those reference sequences is incremented by 1.
After completion of this calculation, references are weighted by the number of reads that align only to them (or their species).
Second:
Let 80% be the given parameter.
For each read r, let Z be the total weight of all references to which r has a significant alignment. The weighted LCA places r on the lowest node that such that aligned references below the node carry at least 80 % of Z.

How do I trigger a projection calculation? Where do I find the scores?

When viewing the main viewer, select the Options->Project assignments to rank… menu item.

1 Like

Thank you for the help. This is very good information.

Am I correct to assume that after Weighted LCA or Projection LCA that the summarized count is the weighted LCA or projection LCA, respectively?

Would I be correct to conclude that in order to do all three counting methods, I must calculate them separately and export them separately?

For the weighted LCA, the counts refer to number of reads assigned, the weights do not play a role in this.
Similarly, for the projection, the counts refer to number of reads.

I was able to find the Projected counts; they are displayed in a new window.

I am still confused where to find the weighted LCA is reported.

Think of LCA and weighted-LCA as two alternative algorithms. Either you use the one, or the other, so
don’t expect to see two sets of numbers.
Also, note that the weights are only used to guide the mapping of reads onto nodes, so, in the end,
you won’t actually see the weights. You will only notice that some of reads are assigned to different nodes.