When to use normalized vs absolute counts for viral analyses

rrodgers · March 6, 2023, 9:42pm

Hello, I am running a pipeline for viral metagenomic analyses. We are interested in the phage richness of each sample, as well as potential biomarkers between our study groups. Because our pipeline builds viral contigs, I have generated RMA files using the long reads method (using -lg true), and from these I have generated two compare files: one using absolute counts and one with normalized counts. I have ignored all unassigned reads in both sets. My thinking was to use the normalized counts for investigating richness, and using the absolute counts for biomarker analyses, as we’re planning to use DESeq2 which runs its own normalization. Because I am so unfamiliar with MEGAN, I wanted to see if there were any suggestions regarding using absolute or normalized counts, because I may be thinking about this incorrectly or maybe misunderstanding something. Thank you!

Daniel · March 16, 2023, 6:07pm

That sounds correct to me. Use normalized for species-richness comparisons and absolute counts for other purposes. There is a discussion on the forum about providing a square-root normalization rather than simply scaling down to the smallest input size.