Summarized count of the node does not equal to the summarized counts of all its subnodes

Hi MEGAN community,

I am relatively new to MEGAN. So far I meganized diamond daa files and want to analysis the data in graphical interface. I used compare function for all the samples. For the first step I want to check the percentage of the unassigned reads in the taxonomy or functional annotation for each sample. For that I used export…text csv format… taxonname to count … summarized…
However, I realized for each sample if I sum manually all the counts(that is also number of reads, right?)of all subnodes of a given node, it does not equal to the count of the given node. For examples,
cellular organisms != Bacteria+Eukaryota
NCBI != Bacteria+Eukaryota+Viruses+Not assigned

Why is that? Did I miss something?

Cheers,
Yun

OK, i think I get it now. If reads cannot be resolved in sub nodes, they are assigned to the root node.
Thus cellular_organisms_summed = Bacteria_summed+Eukaryota_summed+ cellular_organisms_assigned.

But that really brings me another question: if I want a percentage of the Bacteria reads. How should I calculate then? Just bacteria_summed / total reads might not be appropriate then. How can I understand the reads that assigned to cellular_organisms but cannot be resolved in either Bacteria or Eukaryota?

That is a very good question. Your point is that some of the reads that are assigned to cellular organisms should be counted as bacteria.

One way to address this is to use the projection method implemented as Options->Project Assignments to Rank. Using this, all reads are projected to a specified rank, proportional to the reads assigned to, or below, a given node.

1 Like

Hi Daniel,

Thank you, this function is very useful for me! Is this another algorithm other than LCA? In the manual I did not see a lot of discription about it.

This is simple (but unpublished) algorithm, called the “taxonomic projection”:

First, select a rank R.
Then, starting at the root node, for each node v above rank R, push the number of reads assigned to v down onto the children of w in proportion to the number of reads assigned on or “below” w, that is, to w or to any descendant of w.

1 Like