Read numbers in Megan "no hits"

I understand that diamond output read number may be considerably less than diamond input read number, as the output excludes reads for which no alignment is found i.e. excludes “no hit” reads.
How is it then that diamond output reads can end up in the “no hit” node when viewed in Megan?

“No Hits” refers to reads that lack any alignments. Typically, this category remains empty when utilizing DIAMOND because DAA files don’t include reads without alignments. However, with BLAST or certain other alignment methods, this category may contain non-empty entries.

If you’re using the --unal option, it controls whether unaligned queries are reported or not.

  • Set it to 0 for “no” if you don’t want unaligned queries to be reported.
  • Set it to 1 for “yes” if you want unaligned queries to be reported.

By default, unaligned queries are reported for the BLAST pairwise, BLAST XML, and SAM formats by DIAMOND.

@Anupam
I believe you have described the “not assigned” node, where alignments are present but they don’t exceed preset thresholds. In the “no hits” node, there are no alignments present.

Hi @brian

MEGAN applies a threshold for the bitscore of hits, disregarding any hit with a bitscore below this threshold for taxonomy assignment. Consequently, a read may register a hit yet remain unassigned in terms of taxonomy, termed as “not assigned”.

Best,
Anupam

@Anupam Yes, that is my understanding too. My question is about the “no hits” node/classification.

Hi @brian,

“No Hits” refers to reads that lack any alignments. Typically, this category remains empty when utilizing DIAMOND because DAA files don’t include reads without alignments. However, with BLAST or certain other alignment methods, this category may contain non-empty entries.

Until you’re using the --unal option, it controls whether unaligned queries are reported or not.

  • Set it to 0 for “no” if you don’t want unaligned queries to be reported.
  • Set it to 1 for “yes” if you want unaligned queries to be reported.

By default, unaligned queries are reported for the BLAST pairwise, BLAST XML, and SAM formats in DIAMOND.

On the other hand, “Not Assigned” comprises reads that possess alignments but couldn’t be assigned within the specified classification.

Could you kindly provide us with the DAA file exhibiting this issue along with the DIAMOND command you utilized to generate the DAA file and MEGAN command use MEGANIZE it (logs are more helpful.)? This will help us investigate the matter further.

I have also revised the previous response to enhance clarity.

Best regards,
Anupam

Hi @Anupam
Apologies for the delayed response. Could you take a look at the attached diamond file - you can see that there are 393 reads present in the “no hits” node. Shouldn’t this be empty if DAA files don’t include reads without alignments?
daa file

Regards,
Brian.

Hi @brian,

Thank you for the DAA file. This is a case for long-read meganization. Although these reads have alignments, they are filtered out because of MEGAN’s feature to skip mini-alignments that are not at the beginning or end of the read (it happen very initially). As a result, 393 reads fall into this category and are placed on the “no hit” node.

One more observation: even though you’re utilizing the long-read mode in MEGAN for meganization, your file reports the total number of reads on the node instead of bases. Did you employ DIAMOND’s long-read mode for alignment, or did you perform alignment in the default DIAMOND mode and then apply the long-read algorithm in MEGAN for meganization?

Best regards,
Anupam

Hi @Anupam
Thanks for this clarification. In response to your question on long-read mode in the diamond alignment, yes, I have that switched on. I have set as follows:
-F 15 --range-culling --top 10
This is correct?

Hi @brian, this is fine, what was your meganization command?

Hi @Anupam
It was
–minSupport 1 --minPercentIdentity 70 --maxExpected 1.0E-9 --lcaAlgorithm longReads --lcaCoveragePercent 51 --longReads --readAssignmentMode readCount --only none

Hi @brian,

It’s fine, since you set --readAssignmentMode as readCount MEGAN will report count instead of bases

Best,
Anupam