Updating mapping files

Is there any way users can easily update the .abin files? Or format it from the ncbi ftp site so it will work with megan? Having updated mapping is really important and appears to be a bottle-neck with this program. If someone could tell me how to do this, it would be greatly appreciated.

A tool for computing such files from files downloaded from NCBI is available in the Ultimate Edition of MEGAN , see : tools/ncbi/make-acc2ncbi

Awesome! So I’ve tried running the script but I keep on encountering problems.

Computing map:
Processing file: /Users/vlok/Desktop/prot.accession2taxid.gz
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (908.1s)
Building table:
(Bits: 27, buckets: 134,217,728, bucket size: 3)
Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded
at megan6u.B.A.A(Unknown Source)
at megan6u.tools.taxonomy.MakeAccession2TaxonomyMappingFile.run(Unknown Source)
at megan6u.tools.taxonomy.MakeAccession2TaxonomyMappingFile.main(Unknown Source)

I tried to solve the above problem by using:

Alas that gave me the following error.
Computing map:
Processing file: /Users/vlok/Desktop/prot.accession2taxid.gz
10% 20% 30% 40% 50% 60% 70% 80% 90% #
A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00000001048c2212, pid=96676, tid=0x0000000000002403

JRE version: Java™ SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
Java VM: Java HotSpot™ 64-Bit Server VM (25.92-b14 mixed mode bsd-amd64 )
Problematic frame:
V [libjvm.dylib+0x2c2212]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again

I currently have 60Gb of RAM allocated to Megan, running on an iMac.

Any help would be much appreciated.

Did you try giving the program more memory? Do so by editing the MEGAN.vmoptions file, as this is also used to set the memory for the tools. You probably should allow 16G or more, does that help?

Yes, I originally had 30G of RAM allocated to Megan, which I then modified to 60G. The make-acc2ncbi script still won’t run to completion (heap dump problem as stated above). The make-gi2ncbi script runs just fine, which is great for now, but not so much when gi numbers get phased out.

I plan to update all mapping files over the two weeks and I fix any bugs along the way…

1 Like

While I am working on this, you can do the following:

gunzip prot.accession2taxid.gz|cut -f1,4 |gzip >prot-acc2tax.map.gz

This will compute a text version of the mapping file that can be used with MEGAN.
However, the file will be huge and MEGAN will struggle to read it all in… That is why I have designed the .abin format…

1 Like

I have just generated new accession to taxonid mapping files for megan. As there are hundreds of millions of accessions, MEGAN uses a disk-based hash table to store these (files ending on .abin). I have just uploaded two new ones,
nucl_acc2tax-June2016.abin and prot_acc2tax-June2016.abin

I generated these using tools/ncbi/make-acc2ncbi. The program required 75G of memory… So you need to run this on a suitable server.

Thanks so much. I’ll have to talk to our computer guys to up the RAM in future.

Just a note, I did get this working on our local system using the latest acc2taxid mapping from NCBI and we needed north of 100GB to get everything to build.