Why am I getting a species assigned for reads with more than 1 match to different species?

danimfernandes · September 10, 2018, 12:32pm

Hello.

I ran “blastn” against a custom reference of 16 genomes and loaded the output into MEGAN with the following parameters (that can also be seen at the bottom of the screenshot):

MinScore: 35
TopPercent: 10
MinSupportPercent: 0.005
MinSupport: 1
LCA algo: weighted
Percent to Cover: 80

As seen below, a read that was assigned to Linum also has matches aligning to Castanea, one of each even with a higher score than the top scored for Linum.

Screenshot%20from%202018-09-10%2014-18-27

How can I improve the specificity of the LCA algo for these situations?

Thanks in advance,
Daniel

Daniel · September 25, 2018, 6:01am

Looks likes the weighted LCA might be working as intended?
The read has equally good alignments to two different species, so the unweighted LCA would place the read on the LCA of the two, whereas the weighted LCA will take weights into account. Here I am presuming that the Linum species has a much higher weight than the Castanea species, and so the read gets assigned to the species of higher weight. (Weight is based on the number of reads that only align to the given species, and to none other).

If you give me access to the data, then I’d be happy to look at it to confirm that things are working as intended (but I will be way for the next two weeks)

danimfernandes · September 27, 2018, 9:35am

Thanks very much for the explanation, namely how the weights work. It does make sense. I might just give a try to the naive algo and see its results, but after your explanation I think you can close this topic! Thanks.