Taxonomic assignment of reads

blandx · February 14, 2019, 5:27am

Thanks very much for responding previously and preparing those updated synonym files. I have been using them lately but have run into an issue that might possibly be related to the use of these files. To refresh, I am using MEGAN 6.13.1 on Windows 7. As indicated, I downloaded the most recent SILVA database files for SSU and LSU (SSURef_Nr99_132_tax_silva_to_NCBI_synonyms.map.gz and LSURef_132_tax_silva_to_NCBI_synonyms.map.gz) and their respective synonym files (SILVA_132_SSURef_Nr99_tax_silva.fasta.gz and SILVA_132_LSURef_tax_silva.fasta.gz). The issue is, when I have performed blastn on our samples, MEGAN assigned reads at much lower taxonomic levels than might be expected (LCA: min score 100, top percent 10 and min support 5).

To give one example, we have matches to polar bear at the species level and, you can take my word for it, there are no polar bears around here. Taking one of the reads, and performing blast against the whole of GenBank reveals a good match to sheep, which makes more sense. There are other similar examples too. Could this be a result of limitations with the database content or something else? I would have expected sheep sequence to be in the LSU database.

What also might be relevant is that the %identity of the matches is low for the assigned taxonomic level (e.g. 88% or 92% at species level). I have tried using the 16S filter (more than 99% identity for species level) but this make no difference to the final result – the assignment of the reads does not change after applying the filter. Could this be a problem with the synonym files or is there a bug with the filter? If I use the older “silva2ncbi.map” file, there appears to be less incorrectly assigned reads. However, the 16S filter does not seem to be work whether using the older synonyms file or not.

Please let me know if you need additional info to assist your enquiries. Many thanks for any effort you make on this problem. Hopefully you can help.

Regards,

David.

Daniel · February 19, 2019, 2:31am

could you please send me a small example file so that I can investigate this

blandx · February 26, 2019, 10:36pm

Dear Daniel,

I have sent a link to a file available through Dropbox. Can you please let me know whether you have successfully retrieved the file. I am very interested to hear if there are any issues that can be resolved here.

Regards,

David.

blandx · March 26, 2019, 8:05am

Daniel,

I previously sent a Dropbox link via email to you as this is someone else’s private data which I can’t make publicly available. Could you please let me know if you have successfully retrieved the example file you requested.

If you have obtained the file, could you please let me know if you are having any success in addressing my query.

Many thanks,

David.

Daniel · April 10, 2019, 4:08pm

I have looked into this. Unfortunately, there is a bug in the current version of MEGAN. The percent-identity filter for SSU analyses only works when using the weighted LCA. I have fixed this bug and will release a new version with the fix later this week.