I’m quite new with MEGAN6 and I’m not sure if there exists any not-binary reference files. I have a contig that using diamond blastx matches to protein reference:
This reference can actually be found in NCBI nr database (https://www.ncbi.nlm.nih.gov/protein/YP_002004542), but MEGAN6 does not assign any taxonomy to it. And we don’t know why. How can I check if this NCBI nr sequence exists in the prot_acc2tax file? We are also wondering problems related to the bitscore 94.7 as MEGAN sets a threshold to 95. Is that correct?
But, we still do not understand why MEGAN6 does not assign any taxa to this contig, even more when the accession hit with blastx is included in the reference file. Do you have any idea about what’s going on here? Could it be something related to the .tree or .map files MEGAN6 uploads when it opens?
Which mapping file are you using? Using the latest mapping file prot_acc2tax-May2017.abin.zip downloaded from the MEGAN 6 website I can parse your two lines and get this:
Hello Daniel!
We realized the file we downloaded as reference from your page was corrupted although it did not gave as any error when using it. Hence, the file we were using did not include all ncbi entries. We downloaded it again and no problem Thank you so much for your help!
Hello Daniel,
I’m trying to get accession.version from taxid identified by MEGAN6 from the output file from diamond blastx. I use the rma2info to process the rma file and get the following file with two columns (column 1 with gene ID and column 2 with tax id): PJOAIHFC_00082 10239 PJOAIHFC_00085 10239 PJOAIHFC_01157 10239 PJOAIHFC_01161 10239 PJOAIHFC_01288 10239 PJOAIHFC_01852 10239
I get up to 755 unique different taxid, but, when I look for them at the prot.accession2taxid to get the corresponding accession.version ID I only found 640. Does this make sense? I would expect to find all of them.