Meganizer DAA file ArrayIndexOutOfBoundsException Error

Hi.

I have been following the protocol from Arumugam et al. 2019 for frame read correction and am having trouble meganizing my daa files. I am receiving an out of bounds error message and no taxonomy mapping. I’m just wondering if it is a bug or my error.

Thanks for any help you can give on this!

Diamond was run as follows, with a version of the nr database downloaded this week:

diamond blastx --range-culling --top 10 -F 15 --outfmt 100 -c1 -b12 -t /dev/shm -p 44 --query inputfile -d nr --out output_file

I have tried to meganize the resulting daa file using the command line and gui version of the meganizer, using the megan-map-Oct2019.db (unzipped) as the database.

I am hitting an “ArrayIndexOutOfBoundsException” error in with both extended and fast annotation modes.

Logs from GUI as follows:

Fast mode:

Meganizing file: input.daa
Annotating DAA file using FAST mode (accession database and first accession per line)
Initializing binning…
Using ‘Interval-Union-LCA’ algorithm (51.0 %) for binning: Taxonomy
Using Multi-Gene Best-Hit algorithm for binning: SEED
Using Multi-Gene Best-Hit algorithm for binning: EGGNOG
Using Multi-Gene Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 27
Total reads: 0
With hits: 0
Alignments: 0
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. INTERPRO2GO: 0
Class. Taxonomy: 0
Class. SEED: 0
Class. EGGNOG: 0
Class. INTERPRO2GO: 0
Loading MEGAN File: input.daa

Extended mode:

Meganizing file: input.daa
Annotating DAA file using EXTENDED mode
Error: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 14
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 14

Are you sure the DAA file is complete. Good way to check: does DIAMOND complain when you attempt to use the view command to extract alignments in a different format?

Running into a similar error using the same approach (Diamond with longreads)

Binning reads…
Binning reads Analyzing alignments
Caught:
java.lang.ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 27
at megan/megan.daa.io.DAAMatchRecord.parseTranscript(DAAMatchRecord.java:131)
at megan/megan.daa.io.DAAMatchRecord.parseBuffer(DAAMatchRecord.java:93)
at megan/megan.daa.io.DAAParser.readQueryAndMatches(DAAParser.java:294)
at megan/megan.daa.connector.ReadBlockGetterDAA.getReadBlock(ReadBlockGetterDAA.java:125)
at megan/megan.data.AllReadsIterator.next(AllReadsIterator.java:75)
at megan/megan.data.AllReadsIterator.next(AllReadsIterator.java:30)
at megan/megan.algorithms.DataProcessor.apply(DataProcessor.java:213)
at megan/megan.core.Document.processReadHits(Document.java:536)
at megan/megan.daa.Meganize.apply(Meganize.java:106)
at megan/megan.tools.DAAMeganizer.run(DAAMeganizer.java:250)
at megan/megan.tools.DAAMeganizer.main(DAAMeganizer.java:63)

Using the following for diamond:

diamond blastx --range-culling --top 10 -F 15 --outfmt 100 -b8 -c2 --frameshift 15 --query sample.fasta --db /data/diamondDB/nr.dmnd --out sample.fasta.daa

and then daa-meganizer:

~/megan/tools/daa-meganizer -i sample.fasta.daa --mapDB /data/diamondDB/megan-map-Oct2019.db --longReads --lcaAlgorithm longReads --lcaCoveragePercent 51 --readAssignmentMode alignedBases -t 16

Hello,
I am running throught the same error when trying to meganize my daa file produced following the steps reported in Arumugam et al.2019. My situation is exactly the same as akwatson. Therefore as you suggested I have checked if the DAA file was complete through the "view " command in DIAMOND and it complains indeed.
Do you know how could I solve this problem or what can cause a DAA file to be incomplete?

Thank you so much.
Ginevra

1 Like

Thanks for the response, so sorry I missed it.

I have run diamond view. I am running into out of memory errors while loading subject identifiers for all but one output file.

This is using: diamond view -a input .daa -o output.view --outfmt 6

I am running it on a high memory hpc node that should have around +200Gb of RAM available.

I can try to reproduce the issue with a smaller dataset instead? Or have I incorrectly specified the output format?

The one file that did work seems strangely formatted for --outfmt 6. The output is only 25 lines long and in the second line there are hundreds of subject IDs for hits in the second field that aren’t separated at all.

E.g.

query1 WP_121438119.1 100.0 5163 0 0 2269304 2284792 1 5163 0.0e+00 10326.8
query1 WP_121438119.1WP_158420174.1WP_129865494.1(etc, there are several hundred here)

Thanks!
Andrew

So it appears that diamond didn’t terminate cleanly. Benjamin Buchfink maintains a separate community page for diamond-specific problems here: http://www.diamondsearch.org

Thanks Daniel. I appreciate you helping diagnose the issue.

For the benefit of others following this thread, I started an issue over on the Diamond github page. It looked like a reproducible problem.

After a bug fix to diamond view, the .daa files appear OK. Any suggestions on how to diagnose this issue?

If you could give me access to a small daa file that triggers the issue then I will debug it

No need to send me a file, I have reproduced the problem and will work on fixing it.

1 Like

Great, thanks very much!

Release 6.19.1 of MEGAN fixes this issue.

Benjamin Buchfink and I figured out what the problem is. There was a change in DIAMOND (build 134) that broke MEGAN’s DAA parser.
I have updated MEGAN’s DAA parser to be compatible with DAA files created by all releases of DIAMOND (both pre- and post- build 134).
(The change in DIAMOND also makes the DIAMOND view command incompatible between different builds of DIAMOND, when the long-reads option was used).

2 Likes

Thank you for your efforts on this fix. I really appreciate how quickly you were able to find a solution.

Appears to be working on my end with the latest release of MEGAN.

Echoing the above comments, thanks for the fast fix! Everything is running smoothly now, and a quick look suggests the frame-shift correction is working very nicely for my dataset! Thanks again.

Unfortunately this seems to be coming up again now.
I just tried to run daa2rma on a daa file which was generated from mapping assembled contigs to NR (as mapping the reads was impossible with our RAM constraints).
I get this log:

    Version   MEGAN Community Edition (version 6.21.18, built 21 Jan 2022)
    Author(s) Daniel H. Huson
    Copyright (C) 2021 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
    Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
    Loading ncbi.map: 2,302,807
    Loading ncbi.tre: 2,302,811
    Loading ec.map:     8,081
    Loading ec.tre:     8,085
    Loading eggnog.map:    30,875
    Loading eggnog.tre:    30,986
    Loading gtdb.map:   240,103
    Loading gtdb.tre:   240,107
    Loading interpro2go.map:    13,894
    Loading interpro2go.tre:    28,869
    Loading seed.map:       979
    Loading seed.tre:       980
    Caught:
    java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
            at megan/megan.tools.DAA2RMA6.run(DAA2RMA6.java:302)
            at megan/megan.tools.DAA2RMA6.main(DAA2RMA6.java:72)

diamond view does not complain, so I am assuming the DAA file is complete and uncorrupted.

If you could make the file available to me then I will look into this.

I ended up finding an error in the script: still using -p true (from previous attempts to use the reads, instead of the assemblies), but only giving one input file.