Updated taxonomy/mapping file?

Hi, I work with viral metagenomics and I am using Megan for taxonomical classification of sequence reads that have been analysed (blastx) with diamond. The current mapping file for Megan that can be downloaded is from Oct. 2019, would it be possible to get a current mapping file?

Regards,
Anne-Lie

I have just started downloading all the files and will build a new release that should be ready by the end of the week.

1 Like

Great, thanks :slight_smile:
/Anne-Lie

1 Like

Daniel: Is producing the sqlite mapping db something that you can walk an end user through? It would be nice to have more frequent updates to the mapping file, and I don’t mind doing the work of producing the sqlite db.

Thanks,
Greg

1 Like

Dear Greg, I have been working on producing a new version of the mapping file for two weeks now. It is not so much updating the taxonomy, but rather updating all the classifications. Should be done very soon.

To build your own mapping file should be easy.

To understand the file, open it in sqlite3. Take a look at the schema:

sqlite> .schema
CREATE TABLE info(id TEXT PRIMARY KEY, info_string TEXT, size NUMERIC );
CREATE TABLE mappings (Accession PRIMARY KEY , Taxonomy INT, SEED INT, EGGNOG INT, INTERPRO2GO INT) WITHOUT ROWID;

The database contains two tables. The info table just contains some general stuff:

sqlite> select * from info;
general|Created 2019-10-15 07:02:08|222275394
Taxonomy|Source: prot_acc2tax-Oct2019.map.gz|222275394
SEED|Source: acc2seed-May2015XX.map.gz|4018092
EGGNOG|Source: acc2eggnog-Oct2019.map.gz|6270842
INTERPRO2GO|Source: acc2interpro-Oct2019.map.gz|27762347

The first entry in each row is the name of the classification, the second is an info string and the last is the number of entries for that classification. A row with key general lists the generation date and number of rows in the mappings table.

The mappings tables contains NCBI accessions as keys and then integer identifiers for the different classifications:

sqlite> select * from mappings limit 1000000, 10;
ABE34864|266265|1175|2375|39458
ABE34872|266265||24869|20449
ABE34876|266265|||
ABE34880|266265|1492|3284|
ABE34881|266265|5690|155401|
ABE34885|266265|||
ABE34886|266265|||17039
ABE34904|266265||1846|
ABE34905|266265|1735|405|
ABE34909|266265|||39420

So, you can create your own mapping file by setting up the two tables and filing them with values, using sqlite3 commands.

I use a Java program to create the files and will look into making it available.

1 Like

I have just uploaded a new release of MEGAN and a new release of the mapping files.

2 Likes

That’s fantastic Daniel. Thanks for the information! I know where to get the NCBI prot.accession2taxid mappings. Can you provide the links for the other 3 mappings?

Many thanks,
Greg

All three require quite a bit of work to generate. We are currently working on adding a few more classifications to MEGAN (e.g. MegaRES, CARD, VFDB, perhaps MetaCyc) and there will be a companion paper describing in detail how each is produced.