I am handling paired end metagenomic datasets. I have generated RMA files for taxonomic analysis for all datasets. I have taken read counts in LCA algorithm. For comparision of all datasets i have used normalized counts and ignored all unassigned reads. Please let me know the mode of normalization done in megan while comparing all the datasets.
Normalization is achieved as follows:
for each sample S, and each class C, report |C|/|S| *m,
where |C| is the count for the class in S, |S| is the total count assigned for sample |S| and m is the minimum size of any of the samples.
So, basically, normalize to the smallest sample size
Thank you, sir, for your reply. As I have previously mentioned that I have ignored unassigned read count while comparing my all datasets, still I need to know the m which is the smallest sample size has involved unassigned reads or not.
If you select ignore unassigned then normalization is respect to (the smallest number of) assigned reads only, otherwise normalize is respect to (the smallest number of) all reads.
As in previous version of MEGAN (MEGAN version 3) , you have provided option to ignore “No hits” as well as “Not Assigned” while comparing multiple samples. I want to ask in MEGAN 6 only option is given for to ignore “Not Assigned” why not “No hits” or i assume that along with “Not assigned” it also ignore “No hits”. This is very important please reply ASAP.
Thank you so much.
“Not assigned” also excludes no hits as well.
Thank you so much Daniel.
May i know after doing normalization manually by above mentioned formula , how can read counts converted into percentage.