Differences in classification using blast2lca and megan6: Expected or erroneous?

I am considering using your bundled “blast2lca” standalone tool (provided together with MEGAN6) as an easy scriptable and fast alternative to running MEGAN6 for every sample. However, I have a question regarding the reproducibility of LCA classification using the standalone blast2lca and the MEGAN6 GUI:

Am I supposed to get identical results with blast2lca and with MEGAN6 if I use the same default parameters and identical mapping files? because when I compare the classification results (based on BLASTX-tables), I get more classifications at higher taxonomic levels using the blast2lca standalone tool than with MEGAN6-GUI.
The total number of classifications is only about 1% higher for blast2lca compared to MEGAN6, but some of the reads which were classified by both show strongly divergent results.

For example one read is only classified up to “cellular organism” by Megan6:
query = M03384:50:000000000-AJEFG:1:1101:17132:1094 TaxId=131567

But the same read is classified as “Proteobacterium” and even up to the species “Marinobacterium sp. AK27” by blast2lca:
query=M03384:50:000000000-AJEFG:1:1101:17132:1094 classification=“d__2; 96;p__1224; 96;c__1236; 80;o__135619; 64;f__135620; 56;g__48075; 16;s__1232683; 8;”

For both I use the same mapping file (“gi2tax-July2016.bin”) and for both i use the same (default) parameters:
–minScore 50.0
–maxExpected 0.01
–topPercent 10.0
–minPercentIdentity 0.0

The default for MEGAN6 (at least in my installation) is NOT to use “Weighted LCA” (Not sure if that is the case for blast2lca also, but I assume so). The only MEGAN6 defaults which I can not seem to be able to set in blast2lca are “Min Support Percent” and “Min Support” (which are “0.01” and “1” in MEGAN6, respectively).

Are such differences to be expected? What settings would I have to use for blast2lca in order to get identical results compared to using the MEGAN6 GUI?

1 Like

I also get different results, with blast2lca generally assigning blast results to a lower node (higher taxonomic resolution) than Megan6. It looks like exporting data using Megan6 “readName_to_taxonName” only includes nodes with 100% support (?) while blast2lca and Megan6 “readName_to_taxonPath” includes lower nodes with support less than 100%.

No, blast2lca does not replicate a MEGAN analysis.

To replicate a MEGAN analysis via a script you should use blast2rma, or daa-meganizer, to first produce a MEGAN rma file, or meganized daa file, and then use rma2info, or daa2info, respectively, to extract whatever classifications you are after.

What is the difference between these two algorithms?I want use blast2lca to parse my predicted genes blast result, is it right?