Dear developers,
Following the issue I encountered with malt v. 0.5.* (which is described here: LCA placement failure with Malt v. 0.5.2 and 0.5.3), I tried to switch back to v. 0.4.1.
I encountered a different issue with this version, which is the following:
By default, the LCA placement appears to be made with a “naive algorithm” and 80% “coverage”, as stated by the malt-run log:
Using 'Naive LCA' algorithm (80.0 %) for binning: Taxonomy
If I understand correctly (also from some testing that I have done), this means that if more than 80% of the references that are hit by a read belong to the same taxon, the read will be assigned to that taxon, ignoring the other references that might have been hit. For example, if there are 8 references for Yersinia pestis and only 2 references for Yersinia pseudotuberculosis in the index, a read hitting all of these will be assigned to Y. pestis (even if the hit to Y. pseudotuberculosis is the best hit).
As for the previous issue, I could not find a way to change this behaviour using available commandline options.
I think that this can be problematic for people using customized reference datasets that are uneven. Typically, for datasets comprising many references for target taxa and just a few “outgroups”, unspecific matches might be reported as specific matches to target taxa.