I have a few questions related to meganizing DIAMOND output. We have a large metagenome data set that has some host contaminant; these were originally run last year using an older version of DIAMOND, v.0.8.36. We have both the original unannotated DAA files and the annotated ones from that time.
Is there a way to remove or ignore during the comparison step (all 68 samples loaded) a specific taxonomic group while assessing functional information (KEGG)?
Can the originally annotated (meganized) files be filtered for the contaminating taxon? Or would we need to restart the annotation from scratch?
We can go back in and re-meganize the older DAA files, but we also noticed this process seems to be extremely slow (each run has about 40M reads). Do you have any recommendations for this step? We’re running this on a cluster but have access to a local disk cache on each of the nodes, as well as an optimized GPFS file store (networked file system).
EDIT: I should add, I’m using MEGAN v6.11.7 UE.