LCA Coverage Percent

schorlton · April 29, 2021, 7:57pm

When I run blast2rma in longRead mode, using nucleotide alignments:

It seems the default --lcaCoveragePercent is 51(https://github.com/husonlab/megan-ce/blob/2a3dc6e4c654636688591e69177e2aa41f43ad22/src/megan/core/Document.java#L112).
Is this expected? Your MEGAN-LR paper makes it seem like the default should be 80%? (https://biologydirect.biomedcentral.com/articles/10.1186/s13062-018-0208-7)
Or are these different parameters?
Is the flag --minPercentReadCover implemented? I seem to have reads classified with 2% coverage even when using it and setting it much higher.

Thanks!

Daniel · May 4, 2021, 11:39am

In the paper we did suggest 80% as the default, but we have since seen that 51% seems to work better. However, there is a tradeoff between false positives (too specific assignment) and false negatives (too unspecific assignment)…

Re 2: did you see the following line in the message window:

Minimum percentage of read to be covered: 50%

(or whatever value you set)?

This will help me to track down what is going on

schorlton · May 4, 2021, 8:33pm

Thanks for the reply! Is that recommended lcp for contigs, long reads or both? Does it matter if this is nucleotide-nucleotide alignments or nucleotide-protein? Do you have a new set of recommended parameters?

Re #2: I I just sent you an RMA file and details to your email to debug.

Thanks!

Daniel · May 5, 2021, 3:24pm

I took a look at your file. The RMA was generated without supplying the associated read sequences. Unfortunately, MEGAN uses those sequences to determine the length of reads and uses a length of 0, if the sequence is present. That is why all your reads pass the min coverage filter. Can you recreate the RMA files and supply the corresponding fasta or fastq files while doing so… Then the filter should work.

Re to your question along nucleotide-nucleotide alignments: we have not looked into those… DNA-to-DNA only makes sense (in my view) if you know that the organisms in your sample have already been sequenced and their DNA is available… That is usually not the case.