Adding custom entries to mapping database

dportik · August 13, 2020, 7:12pm

Hi Daniel,
Thanks for investing the time to convert the mapping DB files to SQLite files - it makes them very accessible.

I have a question about customizing the mapping db files for a particular use-case:

Is it possible to include a custom mapping entry for an organism that is not currently present in the NCBI taxonomy?

For example, let’s say we have a specific strain of species X (or a new species Y), and we would like that to show up in MEGAN. I can certainly make up a unique “accession” number for the sequences in the reference database used, and also add those to the mapping db. However, what should be used for the taxonomy ID in the mapping db? The taxonomy ID number for species X would not be ideal, because we would like to distinguish this “new” organism from species X. Similarly, if there is no entry for species Y, is there a possible workaround to allow some type of taxonomic labeling?

Any advice would be greatly appreciated!

Thanks,
Dan

Daniel · August 14, 2020, 7:19am

Hi Dan

MEGAN uses two files to specify the taxonomy that it displays;

First, ncbi.tre contains the hierarchy in Newick format. This describes a rooted tree. Each node is labeled by an integer that represents a taxon.

It should be easy to update this file if all you want to do is to add a sister node. Say that X has taxon id 333085 and you would like to add a sister node Y with (fake) taxon 2000333085 to the tree. Searching for 333085 finds the number here:

…,333082,333083,333084,333085,333086,333087,333088,…

Edit thus:
…,333082,333083,333084,333085,2000333085,333086,333087,333088,…

Second, ncbi.map contains the mapping of integers to names. Add a line like this:
2000333085 name-for-new-species-Y

Then supply this to MEGAN using Edit->Preferences->Supply Alternative Taxonomy…

The original files ncbi.tre and ncbi.map can be extracted from the jar MEGAN.jar

dportik · August 15, 2020, 12:35am

Hi Daniel,
Thank you for the quick response!

If I am using sam2rma for conversion, will I need to edit these taxonomy files and supply them to the sam2rma program first? Or can the edited taxonomy files be used after the RMA is first created with the default taxonomy files (e.g., by importing the alternative files via the MEGAN preferences tab as you suggested)?

Currently I am working with large SAM files from minimap2, so the conversion to RMA with sam2rma is highly desired.