Very few aligned queries

eoreow · December 2, 2021, 10:55pm

After following the DIAMOND-MEGAN commands from the MEGAN tutorial/paper for PacBio RS II human gut microbiome data, I am finding that not many reads are getting assigned to taxa. The whole dataset is ~70Gb, and the BLAST results is “Reported 1483729 pairwise alignments, 1483729 HSPs. 921 queries aligned.” Of these, only ~16% of the aligned bases are assigned to the species level. I am wondering if the root cause of this is more likely to be 1) contigs being being made up of reads from different species (I noticed this in the long read viewer inspector for reads that are classified only at a high level) or 2) there are a lot of errors that should be corrected with Pilon or other programs?

Seeing these results, what is a good way to proceed? I was thinking of doing something like suggested in this thread: http://megan.informatik.uni-tuebingen.de/t/can-blastx-be-a-problem-best-strategy-for-classifying-assembled-contigs-from-metagenomes-with-megan/107