Difference between blast taxonomy and megan taxonomy

marin-e · June 10, 2020, 1:28pm

Hello,

I aligned a read against NCBI-nr using Diamond and obtained matches against 3 accessions of the same organism: Lupinus albus (taxID = 3870). Then, I used daa-meganizer (MEGAN 6_19_1) using the database megan-map-May2020.db.

I observe differences between the taxID in my blast output and the assignment made by Megan: my three accessions of the same species (taxID = 3870) were assigned to taxID 0, 131567 and 3870 by Megan. Why such differences? As my blast output contain the taxID information, is there a way to extract them with daa-meganizer instead of using a database?

The following table present the differences observed between blast and megan result:

Accession Blastx_taxID Megan_taxID
KAF1858427.1 3870 0
KAF1858500.1 3870 131567
KAF1858388.1 3870 3870

Thanks

Daniel · June 10, 2020, 4:59pm

The mapping file used by Megan maps each accession in nr to the LCA of all taxa listed for the accession
This might be less specific (but safer) than mapping to any one specific Taxon associated with a given accession
Perhaps that is what is going on

marin-e · September 25, 2020, 3:04pm

Hello,

I am sorry to come back to my question after several months, but I am wondering if there is a way to directly extract taxon ID from my DIAMOND results instead of using the MEGAN mapping file (that sometimes doesn’t contain a given accession)?

Thank you very much

Daniel · September 28, 2020, 8:45am

The references themselves do not contain a taxon id, so some sort of mapping is required. Diamond itself supports taxon ids, however, I have not used the feature, so please read the documentation for details.
You can download a mapping of all references to ids here:
ftp://ftp.ncbi.nih.gov/pub/taxonomy/