Mode of Normalization

Adit · September 22, 2017, 9:26am

Hii,
I am handling paired end metagenomic datasets. I have generated RMA files for taxonomic analysis for all datasets. I have taken read counts in LCA algorithm. For comparision of all datasets i have used normalized counts and ignored all unassigned reads. Please let me know the mode of normalization done in megan while comparing all the datasets.

Daniel · September 22, 2017, 1:54pm

Normalization is achieved as follows:
for each sample S, and each class C, report |C|/|S| *m,

where |C| is the count for the class in S, |S| is the total count assigned for sample |S| and m is the minimum size of any of the samples.

So, basically, normalize to the smallest sample size

Adit · September 25, 2017, 12:44pm

Thank you, sir, for your reply. As I have previously mentioned that I have ignored unassigned read count while comparing my all datasets, still I need to know the m which is the smallest sample size has involved unassigned reads or not.

Daniel · September 27, 2017, 2:45am

If you select ignore unassigned then normalization is respect to (the smallest number of) assigned reads only, otherwise normalize is respect to (the smallest number of) all reads.

Adit · February 12, 2018, 12:15pm

Hii Daniel,
As in previous version of MEGAN (MEGAN version 3) , you have provided option to ignore “No hits” as well as “Not Assigned” while comparing multiple samples. I want to ask in MEGAN 6 only option is given for to ignore “Not Assigned” why not “No hits” or i assume that along with “Not assigned” it also ignore “No hits”. This is very important please reply ASAP.
Thank you so much.

Daniel · February 12, 2018, 3:33pm

“Not assigned” also excludes no hits as well.

Adit · February 13, 2018, 4:49am

Thank you so much Daniel.

KING · February 26, 2018, 12:57pm

Hii Daniel
May i know after doing normalization manually by above mentioned formula , how can read counts converted into percentage.