Why does a long Read LCA puts everything in unassigned?

coconut21 · October 16, 2018, 6:20pm

I am working with reads from oxford nanopore sequencing data. They are longer reads but lower quality.
After running a diamond blastx with long read setting, I want to use MEGAN’s LCA algorithm to assign each read a taxa and a function then explore this data in KEGG.

I have tried working with .daa and .m8 files.
I have played around with reducing the LCA params but I’m not sure what they all mean.
Even when I input what I think are more lenient params, every read is put into the Not Assigned group.
This is odd since the blastx has assigned a taxa and a protein name to the reads.

caner · October 17, 2018, 9:35am

Hi,

Can you please post the set of parameters you used if you ran it from the commandline, or a screenshot of the LCA parameters of the import/meganize dialog?

Have you specified the correct mapping files for Taxonomy and KEGG (prot_acc2tax-June2018X1.abin and acc2kegg-Dec2017X1-ue.abin from http://ab.inf.uni-tuebingen.de/data/software/megan6/download/welcome.html)?

Best,
Caner

coconut21 · October 17, 2018, 8:57pm

Thanks for pointing me in the right direction. I wasn’t uploading the correct mapping files.

I added a screen shot of the parameters I used.
My blast results have a lot of gaps and mismatches (probably due to the long error prone nanopore reads).
Would these setting be more lenient for such reads?

caner · October 17, 2018, 9:16pm

Hi,

I wouldn’t worry about min score, max expected, and min percent identity as there are more advanced filters for long-reads, which will filter-out bad alignments anyways (since your input is DAA, these are probably already filtered by DIAMOND…).

Percent-to-cover set to 50 doesn’t sound good, though. It’ll be very strict, meaning the assignments you’ll get will likely be very correct, although they will be very unspecific (at higher levels of taxonomy). For a good balance between specificity and sensitivity, we suggest something around 80%. If your dataset consists of organisms very well studied, you can go even higher, otherwise something around 70% might make more sense (in exchange of lowering specifity).