How can I parse with MEGAN a Blast file where the IDs of the hits are GenBank assembly accession or RefSeq assembly accession?

EdnaCF · April 2, 2021, 1:43pm

Hello!
I created my own database with the GenBank and RefSeq assembly access numbers (e.g. GCA_001281445.1, GCF_001281445.1), I did my alignment on my sequences and in my Blast output file, hits are specified with the mentioned IDs. When parsing my Blast output file with MEGAN, the hits have no taxonomic assignments due to the IDs my file contains. Is there a way that MEGAN can use these IDs, since I don’t have the sequence IDs (e.g. CP010818.1)?

Daniel · April 28, 2021, 8:19am

You need to provide a mapping of the accession numbers (such as GCA_001281445) to taxa. This can be supplied in a simple tab-separated format (and given to MEGAN or meganizer as an ``accession mapping file’'):

GCA_001281445 <tab> 65058

Note that GCA_001281445 is provided without the version number (in this case .1) and that 65058 is the corresponding NCBI taxon id (in this case Corynebacterium ulcerans)

EdnaCF · May 4, 2021, 4:41pm

Thanks Daniel H., with your help I was able to carry out my task.

I generated my accession mapping file as you indicated and I was able to get the taxonomic tree in MEGAN6 of my alignment in BLASTn with my own database.