I am considering using your bundled “blast2lca” standalone tool (provided together with MEGAN6) as an easy scriptable and fast alternative to running MEGAN6 for every sample. However, I have a question regarding the reproducibility of LCA classification using the standalone blast2lca and the MEGAN6 GUI:
Am I supposed to get identical results with blast2lca and with MEGAN6 if I use the same default parameters and identical mapping files? because when I compare the classification results (based on BLASTX-tables), I get more classifications at higher taxonomic levels using the blast2lca standalone tool than with MEGAN6-GUI.
The total number of classifications is only about 1% higher for blast2lca compared to MEGAN6, but some of the reads which were classified by both show strongly divergent results.
For example one read is only classified up to “cellular organism” by Megan6:
query = M03384:50:000000000-AJEFG:1:1101:17132:1094 TaxId=131567
But the same read is classified as “Proteobacterium” and even up to the species “Marinobacterium sp. AK27” by blast2lca:
query=M03384:50:000000000-AJEFG:1:1101:17132:1094 classification=“d__2; 96;p__1224; 96;c__1236; 80;o__135619; 64;f__135620; 56;g__48075; 16;s__1232683; 8;”
For both I use the same mapping file (“gi2tax-July2016.bin”) and for both i use the same (default) parameters:
–minScore 50.0
–maxExpected 0.01
–topPercent 10.0
–minPercentIdentity 0.0
The default for MEGAN6 (at least in my installation) is NOT to use “Weighted LCA” (Not sure if that is the case for blast2lca also, but I assume so). The only MEGAN6 defaults which I can not seem to be able to set in blast2lca are “Min Support Percent” and “Min Support” (which are “0.01” and “1” in MEGAN6, respectively).
Are such differences to be expected? What settings would I have to use for blast2lca in order to get identical results compared to using the MEGAN6 GUI?