I tried to do normalization of “absolute” values using the equation, *|C|/|S|m manually. I found minor differences between MEGAN normalized read counts and my calculation. Please refer to the below table:
As you can see there is difference between the manually calculated and MEGAN reported normalized reads counts. Can tell me, if I missed anything or why there is this difference?
I took another look at the code and at your numbers. The discrepancies are larger than what I would expect. Basically, I would only expect to see differences due to rounding. I just tried this out on a collection of files to confirm this.
When doing normalization, MEGAN reports the following line in the message window:
Normalizing to: N reads per sample.
What was N for your data? Was it exactly 2,180,466?
I think that I want to go back to my original explanation for the discrepancies of counts, e.g. 87918 vs 88072.
MEGAN normalizes by scaling counts down to the smallest input sample size. However, the code uses rounding in a number of key steps. I have rewritten the code so as to avoid rounding and using the next release of MEGAN, 6.18.9, you should see none of the differences that you have pointed out.