Java exception error while loading new "prot_acc2tax-Mar2018X1.abin" file

Hi

I am using rapsearch2 to find matches for my metagenome data. I have a local NCBI-NR database. I am searching my data against it. NR database was downloaded (version) on 07/08/17.

I have successfully ran rapsearch2 and obtained *.aln files. From the *.aln output I can see that there are many hits.

Sample:
D00480L:380:HGG5GBCX2:2:1101:5238:3833#0/1 vs WP_012147338.1 bits=164.466 log(E-value)=-37.3965 identity=100% aln-len=81 mismatch=0 gap-openings=0 nFrame=1
Query: 2 KTVGNFIGGQVCLSSSNQTVDVHNPATGQVERRVTQSTAAEVKQAIDVAHQAFADWSRTTPLRRARIMFNFKALLEQHRDE 244
KTVGNFIGGQVCLSSSNQTVDVHNPATGQVERRVTQSTAAEVKQAIDVAHQAFADWSRTTPLRRARIMFNFKALLEQHRDE
Sbjct: 2 KTVGNFIGGQVCLSSSNQTVDVHNPATGQVERRVTQSTAAEVKQAIDVAHQAFADWSRTTPLRRARIMFNFKALLEQHRDE 82

Then I tried converting to MEGAN format using the latest “prot_acc2tax-Mar2018X1.abin” file. But no taxon assignment was found.

MEGAN conversion parameters used:

import blastfile=’$INPUT’ fastafile=’$FASTA’ meganfile=’$RMA’ maxmatches=25 minscore=100 minsupport=25 minComplexity=0.33 useSeed=true useKegg=true paired=false textstoragepolicy=0 mapping=‘Taxonomy:GI_MAP=true, KEGG:GI_MAP=true,SEED:GI_MAP=true’;

When I checked the “log file” I saw this error while loading “prot_acc2tax-Mar2018X1.abin” file.

log file:
Executing: load treeFile=‘ncbi.tre’;
Loading mapping file: ncbi.map
Reading file: ncbi.map: 1261433
Loading taxonomy file: ncbi.tre
Reading file: ncbi.tre: 1261433
Command:
Executing:
MEGAN> Command: setprop MaxNumberCores=4
Executing: setprop MaxNumberCores=4;
MEGAN> Command: set loadAllReadsIntoMemory=true
Executing: set loadAllReadsIntoMemory=true;
MEGAN> Command: load taxGIFile=’…/…/test/prot_acc2tax-Mar2018X1.abin’
Executing: load taxGIFile=’…/…/test/prot_acc2tax-Mar2018X1.abin’;
Loading file: prot_acc2tax-Mar2018X1.abin
java.lang.ArrayIndexOutOfBoundsException: 0
100% (2.0s)
Entries: 1548
100% (2.0s)
MEGAN> Command: load seedGIFile=’/db/megan/040315/gi2seed.map’
Executing: load seedGIFile=’/db/megan/040315/gi2seed.map’;
Loading file: gi2seed.map
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (12.2s)
Entries: 7991905
100% (12.3s)
MEGAN> Command: load keggGIFile=’/db/megan/040315/gi2kegg.map’
Executing: load keggGIFile=’/db/megan/040315/gi2kegg.map’;
Loading file: gi2kegg.map
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (21.6s)
Entries: 3663904
100% (21.6s)
MEGAN> Command: import blastfile=‘APG59.aln’ fastafile=’…/fasta/APG57.fasta’ meganfile=‘rma/APG59.rma’ maxmatches=25 minscore=100 minsupport=25 minComplexity=0.33 useSeed=true useKegg=true paired=false textstoragepolicy=0 mapping=‘Taxonomy:GI_MAP=true, KEGG:GI_MAP=true,SEED:GI_MAP=true’;
Executing: import blastfile=‘APG59.aln’ fastafile=’…/fasta/APG57.fasta’ meganfile=‘rma/APG59.rma’ maxmatches=25 minscore=100 minsupport=25 minComplexity=0.33 useSeed=true useKegg=true paired=false textstoragepolicy=0 mapping=‘Taxonomy:GI_MAP=true, KEGG:GI_MAP=true,SEED:GI_MAP=true’;
Importing data:
Importing data: 1 reads file(s), 1 blast file(s)
Input format: RapSearch2
TextStoragePolicy: Embed matches and reads in MEGAN file
Mapping all reads in memory
Processing APG57.fasta: Processing APG57.fasta
2000000
Processing APG59.aln
Processing APG59.aln
Processing RapSearch2 file(s)
Total reads: 51660
Total no-hits: 0
Total matches: 968730
Matches discarded: 0
Parsing required 267 seconds
Running Data analyzer: Init
Loading SEED tree files
Loading seed.map: 8324
Loading seed.tre: 9801
Loading KEGG tree files
Loading kegg.map: 9567
Loading kegg.tre: 19920
Loading rn.list: 8270
Analyzing all matches
Applying min-support filter
Number of changes due to min-support filter: 0
Number of reads: 51660
Low complexity: 0
With valid hits: 10322
With SEED-ids: 0
With KEGG-ids: 0
Writing classification tables
Number of taxa identified: 1
Number of SEED classes identified: 1
Number of KEGG classes identified: 1
Syncing
Data processor required: 20 secs
MEGAN> Total reads: 51660
Assigned reads: 0
Unassigned reads: 51660
Reads with no hits: 0
Reads low comp.: 0
Induce Taxonomy tree, keeping 2 of 1261433 nodes
Induced taxonomy tree has 2 nodes
Command: quit;
Executing: quit;
Tue May 29 21:17:23 +08 2018

Do you have any idea what error is this? and any suggestion/modification in my analysis steps?

Thanks in advance!

You specified the mapping file as a “GI” mapping file, that is incorrect, you must specify it as an “Accession” mapping file.

Thank you for the reply.

Can I mention,

load AccessionFile=’$acc2taxfile’ ?

and then what is best to use instead of “mapping=‘Taxonomy:GI_MAP=true”

Thanks!

Where can I find the documentation on this? Thanks :slight_smile:

  1. Did you try using the command line program megan/tools/blast2rma?

  2. Using MEGAN UE, the command for loading a mapping file is:

load mapFile= mapType=<GI|Accession|Synonyms> cName=<EGGNOG|INTERPRO2GO|KEGG|SEED|Taxonomy> [parseTaxonNames={false|true}]; - Loads a mapping file

Thank you Daniel. I found the solution using MEGAN6 command line and it ran succefully. Thank you for the reply. :slight_smile: