MEGAN for non-bacterial or mixed datasets

Hi all.

I Have sequencing data from some ocean animal.
and it is predicted to containing bacterial reads.

My purpose is investigating those bacterial, or any microbial reads.
in this situation, can i use blast output without separating microbial reads from ocean animal to run MEGAN?
(I means, is MEGAN separates metagenomic reads from animal or plants reads?)


It would be best to first use a DNA alignment tool such as bowtie2 to identify host reads and to remove them from your dataset before using DIAMOND to align your reads to the NR database for microbial analysis.
There are two reasons to do so:

  1. speed: in the past, aligning reads against the NR database took a huge amount of time and reduction of the number of reads to align was helpful. This is no longer true since the introduction of DIAMOND last year, which does a BLASTX like alignment at an insanely fast speed
  2. false positives: host reads lead to many false positive alignments and you will “detect” many microbes that are not really there.

So in summary, yes, you can run DIAMOND on your complete set of reads, followed by MEGAN, but then expect to see many false positive identifications. (So, better first to remove host reads using the hosts, or a related, genome)

Thank you for explanation!