Questions on the Meganizing step

I have a few questions related to meganizing DIAMOND output. We have a large metagenome data set that has some host contaminant; these were originally run last year using an older version of DIAMOND, v.0.8.36. We have both the original unannotated DAA files and the annotated ones from that time.

  1. Is there a way to remove or ignore during the comparison step (all 68 samples loaded) a specific taxonomic group while assessing functional information (KEGG)?

  2. Can the originally annotated (meganized) files be filtered for the contaminating taxon? Or would we need to restart the annotation from scratch?

  3. We can go back in and re-meganize the older DAA files, but we also noticed this process seems to be extremely slow (each run has about 40M reads). Do you have any recommendations for this step? We’re running this on a cluster but have access to a local disk cache on each of the nodes, as well as an optimized GPFS file store (networked file system).

EDIT: I should add, I’m using MEGAN v6.11.7 UE.

I should add, I also tested the ‘contaminants’ marking/removal you mention here with our original DIAMOND runs, these do get annotated with taxa and functions, but the contaminants are still present. I’m using this via daa-meganizer option --conFile as follows:

daa-meganizer -i $DAA \
    -a2t $MEGAN_DATA/$ACC2TAX \
    -a2interpro2go $MEGAN_DATA/$ACC2IP \
    -a2seed $MEGAN_DATA/$ACC2SEED \
    -a2kegg $MEGAN_DATA/$ACC2KEGG \
    -a2eggnog $MEGAN_DATA/$ACC2EGGNOG \
    --conFile contaminants.txt \
    -v > $NAME.meganize.log

contaminants.txt is:

Chordata

Am I missing a crucial step?

Thanks @Daniel !

Please update to 6.12.0, the contaminant code should work now

Thanks, I’ll give that a try!

Just to reply back that this does appear to work quite well, thanks for adding this!