Updating mapping files

marlivlok · May 31, 2016, 3:26am

Is there any way users can easily update the .abin files? Or format it from the ncbi ftp site so it will work with megan? Having updated mapping is really important and appears to be a bottle-neck with this program. If someone could tell me how to do this, it would be greatly appreciated.

Daniel · May 31, 2016, 7:13am

A tool for computing such files from files downloaded from NCBI is available in the Ultimate Edition of MEGAN , see : tools/ncbi/make-acc2ncbi

marlivlok · June 13, 2016, 9:45pm

Awesome! So I’ve tried running the script but I keep on encountering problems.

Computing map:
Processing file: /Users/vlok/Desktop/prot.accession2taxid.gz
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (908.1s)
Building table:
(Bits: 27, buckets: 134,217,728, bucket size: 3)
Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded
at megan6u.B.A.A(Unknown Source)
at megan6u.tools.taxonomy.MakeAccession2TaxonomyMappingFile.run(Unknown Source)
at megan6u.tools.taxonomy.MakeAccession2TaxonomyMappingFile.main(Unknown Source)

I tried to solve the above problem by using:
-XX:-UseGCOverheadLimit

Alas that gave me the following error.
Computing map:
Processing file: /Users/vlok/Desktop/prot.accession2taxid.gz
10% 20% 30% 40% 50% 60% 70% 80% 90% #
A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00000001048c2212, pid=96676, tid=0x0000000000002403

JRE version: Java™ SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
Java VM: Java HotSpot™ 64-Bit Server VM (25.92-b14 mixed mode bsd-amd64 )
Problematic frame:
V [libjvm.dylib+0x2c2212]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again

I currently have 60Gb of RAM allocated to Megan, running on an iMac.

Any help would be much appreciated.

Daniel · June 15, 2016, 9:49pm

Did you try giving the program more memory? Do so by editing the MEGAN.vmoptions file, as this is also used to set the memory for the tools. You probably should allow 16G or more, does that help?

marlivlok · June 16, 2016, 8:26pm

Yes, I originally had 30G of RAM allocated to Megan, which I then modified to 60G. The make-acc2ncbi script still won’t run to completion (heap dump problem as stated above). The make-gi2ncbi script runs just fine, which is great for now, but not so much when gi numbers get phased out.

Daniel · June 22, 2016, 9:46am

I plan to update all mapping files over the two weeks and I fix any bugs along the way…

Daniel · June 22, 2016, 5:34pm

While I am working on this, you can do the following:

gunzip prot.accession2taxid.gz|cut -f1,4 |gzip >prot-acc2tax.map.gz

This will compute a text version of the mapping file that can be used with MEGAN.
However, the file will be huge and MEGAN will struggle to read it all in… That is why I have designed the .abin format…

Daniel · June 23, 2016, 4:27pm

I have just generated new accession to taxonid mapping files for megan. As there are hundreds of millions of accessions, MEGAN uses a disk-based hash table to store these (files ending on .abin). I have just uploaded two new ones,
nucl_acc2tax-June2016.abin and prot_acc2tax-June2016.abin

I generated these using tools/ncbi/make-acc2ncbi. The program required 75G of memory… So you need to run this on a suitable server.

marlivlok · June 23, 2016, 8:09pm

Thanks so much. I’ll have to talk to our computer guys to up the RAM in future.

cjfields · April 23, 2017, 10:56pm

Just a note, I did get this working on our local system using the latest acc2taxid mapping from NCBI and we needed north of 100GB to get everything to build.