Mapping taxonomy from DAA files

I have created a number of DAA files using Diamond ver 8.1.8. I understand that DAA files do not contain taxonomy information but require a mapping file to generate the taxonomy.

Can anyone recommend an appropriate taxonomy mapping for use with Diamond CE ver 6.5.8.? I understand there is a tool (make-acc2ncbi) to do this in the Ultimate Edition, but no equivalent in the CE. I have successfully used all other mapping files to generate InterPro2GO, SEED, KEGG EggNOG information.

hi andy_n,

have you tried the prot/nucl_acc2tax mapping files available here: http://ab.inf.uni-tuebingen.de/data/software/megan6/download/welcome.html? They worked for me…
Also, could you please share how you were able to generate SEED and KEGG info? I’ve tried the gi2kegg/seed files available in the above link without success.

Cheers and good luck,
Nsa

Hi nerdynella

thank you for the reply - I have tried the files you suggest in a number of combinations with no success. I was using the weighted LCA option at 80%, have you had success using those mapping files and the wLCA approach?

Regarding the SEED and KEGG mapping files: using the meganizer GUI I chose the gi2seed-May2015X.bin and the KEGG mapping file from here http://www-ab.informatik.uni-tuebingen.de/data/software/megan5/download/gi2kegg.zip

I’m confused that we’re getting different levels of success using the same mapping files. I wonder if they are consistently downloading completely?

Any thoughts?

Further to the discussion above, I have recently downloaded and unzipped the taxonomy mapping files again but am still not having any luck during the meganizing step. I’m still without any taxonomy.

Makes MEGAN rather useless really

Please let us know exactly what are doing so that we can figure out the problem is.

Hello Daniel, thank you for your question.

To start, we are using DIAMOND ver 0.8.22 in conjunction with MEGAN ver 6.5.8. MEGAN is implemented on a cluster and has access to 64 Gb RAM. DIAMOND is outputting DAA files using the command

diamond blastx --db nr_22-08-2016_db --query input.fasta --min-score 55 --out output_OUT --threads 80 --block-size 7.0 --index-chunks 1 --outfmt 100 --tmpdir /das_data/

We have downloaded mapping files for Taxonomy, SEED, EggNOG and Interpro2GO from the DIAMOND6 download page and the KEGG mapping file from the MEGAN5 download page. (We have used MEGAN5 for a while and found it very useful).

I have currently been working on a collection of 7 metagenomes. In the MEGANIZER window I have used mapping files for all tabs (ie. taxonomy, eggNOG, Interpro2GO, SEED and KEGG) and weighted LCA at 80%. Consistently, the output DAA files are mapped successfully in all cases except for the taxonomy. I have tried using the nucl-acc2taxid-August2016 and the nucl-gi2taxid-August2016 both individually and together but without success. I have downloaded and unzipped the files for a second time just in case the first were corrupted but still with no success. I am sure that I am doing something wrong but cannot see where.

Any help you can provide Daniel would be much appreciated

Andy

hey Andy sorry for getting back late, I thought your questions had been answered.
I think I see the problem, using blastx you’ve translated your nucleotide sequences to proteins and compared the translated protein seqs to a protein db to obtain potential protein products encoded by your nucleotide query (from NCBI:ftp://ftp.ncbi.nlm.nih.gov/pub/factsheets/HowTo_BLASTGuide.pdf).

So using nucl mapping files wont give you any taxonomy outputs. Try using the prot mapping files and see if you get your taxonomy.
Hope this helps.

Btw I wasn’t able to get any SEED or KEGG outputs and I think it’s because I used a newer NCBI-nr db (downloaded Sept 20, 2016) that doesn’t contain gb or ref tags (Changes in NCBI nr database as of September 2016 currently impacting MEGAN) required by Daniel et al. to make mapping files. I had to go back to an old database version (downloaded in June 2016) and an old DIAMOND version - for some reason DIAMOND version 0.8.6 wont accept an older NCBI-nr db - and my problems were resolved.

Cheers,
Nsa

Der Nsa,

I am currently working on building new mapping files that work with the latest NR releases
D

Hi Daniel,

Are the new mapping files ready? I tried to download those available in website but got this message:

The requested URL /data/software/megan6/download/prot_acc2tax-Nov2016.abin.zip was not found on this server.

They are now available

Hi NerdyNella

thank you for the hint - it’s taken me a while to get back to it but what you suggested worked. Funny - I am sure I tried that before without success.

Other users here are having similar patchy results, I wonder if it has more to do with our installation and environment rather than MEGAN?

Have you had any success with your issues with the KEGG and other databases?

Andy

There was an issue using new NR downloads (September 2016 or later) in MEGAN, even when using accession-based mapping files. I have fixed all such issues. If you now use the latest release of MEGAN (6.6.3) and the very latest acc mapping files downloaded from the MEGAN6 download page (built November 2016), then everything should work just fine…

If you are using daa-meganizer, blast2rma or daa2rma, then make sure that you do not change the value of the option

–firstWordIsAccession

It should stay set to true, which is the default value in the latest release of MEGAN. (Only set this to false if you are using an old version of the NR database file, dating August-2016 or older).