I have been following the protocol from Arumugam et al. 2019 for frame read correction and am having trouble meganizing my daa files. I am receiving an out of bounds error message and no taxonomy mapping. I’m just wondering if it is a bug or my error.
Thanks for any help you can give on this!
Diamond was run as follows, with a version of the nr database downloaded this week:
I have tried to meganize the resulting daa file using the command line and gui version of the meganizer, using the megan-map-Oct2019.db (unzipped) as the database.
I am hitting an “ArrayIndexOutOfBoundsException” error in with both extended and fast annotation modes.
Logs from GUI as follows:
Fast mode:
Meganizing file: input.daa
Annotating DAA file using FAST mode (accession database and first accession per line)
Initializing binning…
Using ‘Interval-Union-LCA’ algorithm (51.0 %) for binning: Taxonomy
Using Multi-Gene Best-Hit algorithm for binning: SEED
Using Multi-Gene Best-Hit algorithm for binning: EGGNOG
Using Multi-Gene Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 27
Total reads: 0
With hits: 0
Alignments: 0
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. INTERPRO2GO: 0
Class. Taxonomy: 0
Class. SEED: 0
Class. EGGNOG: 0
Class. INTERPRO2GO: 0
Loading MEGAN File: input.daa
Extended mode:
Meganizing file: input.daa
Annotating DAA file using EXTENDED mode
Error: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 14
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 14
Are you sure the DAA file is complete. Good way to check: does DIAMOND complain when you attempt to use the view command to extract alignments in a different format?
Running into a similar error using the same approach (Diamond with longreads)
Binning reads…
Binning reads Analyzing alignments
Caught:
java.lang.ArrayIndexOutOfBoundsException: Index 27 out of bounds for length 27
at megan/megan.daa.io.DAAMatchRecord.parseTranscript(DAAMatchRecord.java:131)
at megan/megan.daa.io.DAAMatchRecord.parseBuffer(DAAMatchRecord.java:93)
at megan/megan.daa.io.DAAParser.readQueryAndMatches(DAAParser.java:294)
at megan/megan.daa.connector.ReadBlockGetterDAA.getReadBlock(ReadBlockGetterDAA.java:125)
at megan/megan.data.AllReadsIterator.next(AllReadsIterator.java:75)
at megan/megan.data.AllReadsIterator.next(AllReadsIterator.java:30)
at megan/megan.algorithms.DataProcessor.apply(DataProcessor.java:213)
at megan/megan.core.Document.processReadHits(Document.java:536)
at megan/megan.daa.Meganize.apply(Meganize.java:106)
at megan/megan.tools.DAAMeganizer.run(DAAMeganizer.java:250)
at megan/megan.tools.DAAMeganizer.main(DAAMeganizer.java:63)
Hello,
I am running throught the same error when trying to meganize my daa file produced following the steps reported in Arumugam et al.2019. My situation is exactly the same as akwatson. Therefore as you suggested I have checked if the DAA file was complete through the "view " command in DIAMOND and it complains indeed.
Do you know how could I solve this problem or what can cause a DAA file to be incomplete?
I have run diamond view. I am running into out of memory errors while loading subject identifiers for all but one output file.
This is using: diamond view -a input .daa -o output.view --outfmt 6
I am running it on a high memory hpc node that should have around +200Gb of RAM available.
I can try to reproduce the issue with a smaller dataset instead? Or have I incorrectly specified the output format?
The one file that did work seems strangely formatted for --outfmt 6. The output is only 25 lines long and in the second line there are hundreds of subject IDs for hits in the second field that aren’t separated at all.
E.g.
query1 WP_121438119.1 100.0 5163 0 0 2269304 2284792 1 5163 0.0e+00 10326.8
query1 WP_121438119.1WP_158420174.1WP_129865494.1(etc, there are several hundred here)
So it appears that diamond didn’t terminate cleanly. Benjamin Buchfink maintains a separate community page for diamond-specific problems here: http://www.diamondsearch.org
Benjamin Buchfink and I figured out what the problem is. There was a change in DIAMOND (build 134) that broke MEGAN’s DAA parser.
I have updated MEGAN’s DAA parser to be compatible with DAA files created by all releases of DIAMOND (both pre- and post- build 134).
(The change in DIAMOND also makes the DIAMOND view command incompatible between different builds of DIAMOND, when the long-reads option was used).
Echoing the above comments, thanks for the fast fix! Everything is running smoothly now, and a quick look suggests the frame-shift correction is working very nicely for my dataset! Thanks again.
Unfortunately this seems to be coming up again now.
I just tried to run daa2rma on a daa file which was generated from mapping assembled contigs to NR (as mapping the reads was impossible with our RAM constraints).
I get this log:
Version MEGAN Community Edition (version 6.21.18, built 21 Jan 2022)
Author(s) Daniel H. Huson
Copyright (C) 2021 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,302,807
Loading ncbi.tre: 2,302,811
Loading ec.map: 8,081
Loading ec.tre: 8,085
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 13,894
Loading interpro2go.tre: 28,869
Loading seed.map: 979
Loading seed.tre: 980
Caught:
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at megan/megan.tools.DAA2RMA6.run(DAA2RMA6.java:302)
at megan/megan.tools.DAA2RMA6.main(DAA2RMA6.java:72)
diamond view does not complain, so I am assuming the DAA file is complete and uncorrupted.
I ended up finding an error in the script: still using -p true (from previous attempts to use the reads, instead of the assemblies), but only giving one input file.
Hi @Daniel I figured it out. I was not creating the .daa file correctly:
I updated my code: ##To produce a .daa (you cant add in all the taxonomy) w/long read settings
diamond blastx -d /lustre/project/taw/kvigil/Reference/refseq_viral_prot_102723/viralprotein.dmnd -q barcode04.fastq.gz -o ONR10623barcode04.daa -f 100 --ultra-sensitive --long-reads
I was able to open this .daa file, but it shows up as nothing in megan after I meganize it. Thanks! Katie