Advice when to use "Ignore all unassigned reads" for Sample Comparison

Bernhard · December 30, 2022, 7:15pm

Hello!

I am using the Compare functionality of MEGAN (compute-comparison).

Can you give some advice in which case I would want to use “Ignore all unassigned reads” (Ignore all reads placed on the no-hits, not-assigned, or low-complexity node)?
Is there some general recommendation?

I found in discussion on “Mode of Normalization” your comment related to normalization:

If you select ignore unassigned then normalization is respect to (the smallest number of) assigned reads only, otherwise normalize is respect to (the smallest number of) all reads.

That is, when using normalization the choice of ignoring unassigned reads is influencing the taxa and function computations.

Thanks, Best regards!

Daniel · January 4, 2023, 9:39am

I think this depends on the details of the study. For similar samples from similar “theaters of activity”, I would assume that the the differences in unassigned counts can be safely ignored and thus suppressed. When comparing samples from different theaters of activity, then ignoring the unassigned counts is probably not a good idea. However, I haven’t studied this and so can’t give real guidance.