MALT-run fails with StringIndexOutOfBounds exception

Hi,

I tried to align sequences using malt-run to a custom database of genomes downloaded from NCBI RefSeq. While the alignment it seems to be successful, I obtain the same error for each of my samples when analysing the alignments.

+++++ Aligning file: ../03-data/hg19_alignment.backup/CSM299_HG19unmapped.fastq.gz
Starting file: ../04-analysis/parasite/CSM299_HG19unmapped.rma6
10% 20% 100% (340.2s)
Finishing file: ../04-analysis/parasite/CSM299_HG19unmapped.rma6
Binning reads: Initializing...
Initializing binning...
Using Best-Hit algorithm for binning: Taxonomy
Binning reads...
Binning reads: Analyzing alignments
Caught:
java.lang.StringIndexOutOfBoundsException: begin 1000001, end 1000000, length 1000000
        at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3756)
        at java.base/java.lang.String.substring(String.java:1902)
        at megan/megan.rma6.ReadBlockRMA6.read(ReadBlockRMA6.java:295)
        at megan/megan.rma6.ReadBlockGetterRMA6.getReadBlock(ReadBlockGetterRMA6.java:99)
        at megan/megan.rma6.AllReadsIteratorRMA6.next(AllReadsIteratorRMA6.java:77)
        at megan/megan.rma6.AllReadsIteratorRMA6.next(AllReadsIteratorRMA6.java:32)
        at megan/megan.algorithms.DataProcessor.apply(DataProcessor.java:213)
        at megan/megan.core.Document.processReadHits(Document.java:547)
        at malt/malt.io.RMA6Writer.close(RMA6Writer.java:245)
        at malt/malt.MaltRun.launchAlignmentThreads(MaltRun.java:455)
        at malt/malt.MaltRun.run(MaltRun.java:345)
        at malt/malt.MaltRun.main(MaltRun.java:77)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at com.install4j.runtime/com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:84)
        at com.install4j.runtime/com.install4j.runtime.launcher.UnixLauncher.start(UnixLauncher.java:66)
        at install4j.malt.MaltRun.main(Unknown Source)
Total reads:               16
With hits:                  16
Alignments:                 75
Assig. Taxonomy:             4
Binning reads: Writing classification tables
Numb. Tax. classes:          3
Binning reads: Syncing
Class. Taxonomy:             3
Analysis written to file: ../04-analysis/parasite/CSM299_HG19unmapped.rma6
Num. of queries:   18095318
Aligned queries:     152927
Num. alignments:     663294

The version of MALT was:

Version   MALT (version 0.5.2, built 28 Jan 2021)
Author(s) Daniel H. Huson

and the command

malt-run -J-Djavax.accessibility.assistive_technologies=" " -J-XX:ParallelGCThreads=1 -J-Xmx400G             -d ../03-data/refdbs/parasites_210705 -o ../04-analysis/parasite/ --mode BlastN --alignmentType SemiGlobal --inFile ../03-data/hg19_alignment.backup/*__HG19unmapped.fastq.gz --numThreads 48 --replicateQueryCache --minPercentIdentity 85.0 --maxAlignmentsPerQuery 10 --topPercent 1 --minSupport 1 --gapOpen 7 --gapExtend 3 --band 4 --minPercentIdentityLCA 90.0 -v

Does anyone have any suggestion what could have caused this issue?

When analyzing the output files of MALT for the samples for which this error occurred, I could see that the error messages stops the LCA algorithm and I only see that reads were assigned to a subset of the species that I expected to be present in the sample. A large fraction of the reads remain unassigned.

Does anyone have any suggestions what’s going on here? Maybe @Daniel?

Thanks!

I am sorry that I missed this question… If it is still current, I can look at it if you give me access to data that causes the issue.

Thanks, @Daniel, it still is, but the database is rather large (> 200 GB after compressing it in a tarball). So I will see if I can get the same error message with a reduced database and let you know.