The current normalization method normalizes to the smallest given count. This kind of method is today quite easy to criticize as a large part of the dataset is thrown away. Furthermore, there are a few well-known papers directly criticizing the use of rarefying/sub-sampling data for sequencing data.
This point is typically brought up by a reviewer and the dataset then needs to be normalized differently and re-plotted to show that the results are sound. Another common method to normalize data, that doesn’t require assembled contigs and estimated gene lengths into the calculation, is counts per million (CPM, i.e. relative proportion x 1 000 000), and is much less criticized among the community as all data is kept. It is for example, used in the edgeR bioconductor R package and also combined with a trimmed mean of M-values normalization.
Anyway, what I wanted to bring up is that MEGAN only offers one type of normalizations, the sub-sampling approach. Because of this, for me MEGAN is many times just used to extract the absolute counts with functional classifications and then normalized and analyzed in other software. This is a bit unnecessary in my opinion as CPM/TMM-values could simply be an alternative normalization method included in the MEGAN software and the supplied command tool compute-comparison.
Here’s a paper from 2018 that I found on the subject:
Comparison of normalization methods for the analysis of metagenomic gene abundance data