We are using MEGAN6 Community version 6.7.17 on Linux OS (ubuntu 14.04 and CentOS 6.6 (tested on both systems)).
The data we are investigating is generated from metavirome assemblies, and in order to adjust the representation of a particular gene in a dataset, we have calculated contig coverage and use this information to assign a magnitude value for each contig. This is then presented within our blastable data as a magnitude value (eg magnitude=55) within the descriptor for each query sequence.
This has worked fine in previous versions (MEGAN5) and continues to work in MEGAN6 but we have encountered an issue with this magnitude adjustment when we try to compare individual datasets, specifically when we are making a normalised comparison.
Once we highlight multiple magnitude adjusted datasets in the compare window and select “use Normalized counts” we end up with a grossly inflated sampling of our datasets.
The example we have provided is of a comparison of three different datasets:
samplename | absolute read count | Magnitude adjusted counts
Xesto89…383…5688
Xesto7…346…5336
Xesto155…315…4019
Yet if we try to compare these datasets we get the following counts
samplename | non normalised comparison | normalised comparison
Xesto89…5668…59676
Xesto7…5336…61987
Xesto155…4019…51279
what we would expect is a comparison where 4019 (or there abouts) magnitude adjusted reads are presented in each dataset. this is not the case.
Our best guess is that when the comparison is being made, MEGAN6 is trying to take 4019 reads from each dataset and then making this “normalised” comparison, however we have less than 400 actual reads per sample, so it is artificially duplicating reads, but there also seems to be an additional magnitude adjustment going on, hence the final counts producing a > 10fold increase in the number of reads in our datasets.
When we compare many samples, this issue becomes particularly problematic.
We have attached a normalized and non-normalized comparison of our datasets using magnitude adjusted samples, and have also attached the three associated RMA6 files for the three datasets. A single example of the associated blast file and fasta file were attached as well for reference.
these were archived in a tar.gz file due to upload limitations
files ending in blast.gz are zipped blast files
files ending in .faa are the unzipped fasta files (with magnitude values in descriptors)
files ending in rma6 are the corresponding megan files
example. normalized.magadjusted.megan is the resulting normalized comparison of these three datasets
example.nonnormalized.magadjusted.megan is the resulting non-normalized comparison of these three datasets
if you need any more info, please let me know, or if we are doing something wrong, also let me know
thanks
Patrick
meganmagnitudeissue.tar.gz (2.4 MB)