Reads data not tally

  1. May I know BlastX result will be more similar with summed reads or assigned reads?


  2. Why the summed or assigned reads data of its lineages does not equal to the assigned reads data of its upper taxonomy ranking. eg: 2487(Desulfobacterales) + 1726 (unclassified Deltaproteobacteria) = 4213 which is not tally with assigned reads data of its upper taxonomy ranking which is 4732 (Deltaproteobacteria) as shown in diagram above.

Hey,

I’m not from MEGAN, but I was wondering about this earlier as well. I believe how it work is as followed:

4,732 reads are assigned to Deltaproteobacteria, but not to any of the lower taxonomies as they probably map to several references within the different lower clades.
2,487 reads were specific enough to be assigned to Desulfobacteriales (+ its lower taxonomies), and the same for 1,726 to unclassified Deltaproteobacteria.

If you add those numbers (4,732 + 2,487 + 1,726) you get 8,945, which is the summed for Deltaproteobacteria.

Cheers,
Meriam

Hi Meriam,

Thank you so much for your reply.
Do you know how can I get all of the summed reads data?
because for now I need to key in the data one by one manually in a table form which is very time consuming.

Cheers,
Sinhui

Hey Sinhui,

No sorry, I don’t know!

Hi, also not from MEGAN, but here is how I did it:

Select desired node level (or multiple by holding shift and clicking specific target nodes), right click → Uncollapse subtree, Select → Leaves Below, Export → Text (CSV) Format, use comma or tab delimiter as you like, name the file and place it in desired folder.

Be sure to manually change file format to “.csv” if you select comma as the delimiter. It seems they need to code this in somewhere to make it smoother.

Good luck!

The summarized count for a node equals the sum of the assigned count for the node plus the summarized counts of all its children, e.g.

for Desulfobacterales we have 2487=1538+609+340
and for Deltaproteobacteria we have 8945=2487+1726+4732.