MALT-run fails with StringIndexOutOfBounds exception


I tried to align sequences using malt-run to a custom database of genomes downloaded from NCBI RefSeq. While the alignment it seems to be successful, I obtain the same error for each of my samples when analysing the alignments.

+++++ Aligning file: ../03-data/hg19_alignment.backup/CSM299_HG19unmapped.fastq.gz
Starting file: ../04-analysis/parasite/CSM299_HG19unmapped.rma6
10% 20% 100% (340.2s)
Finishing file: ../04-analysis/parasite/CSM299_HG19unmapped.rma6
Binning reads: Initializing...
Initializing binning...
Using Best-Hit algorithm for binning: Taxonomy
Binning reads...
Binning reads: Analyzing alignments
java.lang.StringIndexOutOfBoundsException: begin 1000001, end 1000000, length 1000000
        at java.base/java.lang.String.checkBoundsBeginEnd(
        at java.base/java.lang.String.substring(
        at megan/
        at megan/megan.rma6.ReadBlockGetterRMA6.getReadBlock(
        at megan/
        at megan/
        at megan/megan.algorithms.DataProcessor.apply(
        at megan/megan.core.Document.processReadHits(
        at malt/
        at malt/malt.MaltRun.launchAlignmentThreads(
        at malt/
        at malt/malt.MaltRun.main(
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.base/java.lang.reflect.Method.invoke(
        at com.install4j.runtime/com.exe4j.runtime.LauncherEngine.launch(
        at com.install4j.runtime/com.install4j.runtime.launcher.UnixLauncher.start(
        at install4j.malt.MaltRun.main(Unknown Source)
Total reads:               16
With hits:                  16
Alignments:                 75
Assig. Taxonomy:             4
Binning reads: Writing classification tables
Numb. Tax. classes:          3
Binning reads: Syncing
Class. Taxonomy:             3
Analysis written to file: ../04-analysis/parasite/CSM299_HG19unmapped.rma6
Num. of queries:   18095318
Aligned queries:     152927
Num. alignments:     663294

The version of MALT was:

Version   MALT (version 0.5.2, built 28 Jan 2021)
Author(s) Daniel H. Huson

and the command

malt-run -J-Djavax.accessibility.assistive_technologies=" " -J-XX:ParallelGCThreads=1 -J-Xmx400G             -d ../03-data/refdbs/parasites_210705 -o ../04-analysis/parasite/ --mode BlastN --alignmentType SemiGlobal --inFile ../03-data/hg19_alignment.backup/*__HG19unmapped.fastq.gz --numThreads 48 --replicateQueryCache --minPercentIdentity 85.0 --maxAlignmentsPerQuery 10 --topPercent 1 --minSupport 1 --gapOpen 7 --gapExtend 3 --band 4 --minPercentIdentityLCA 90.0 -v

Does anyone have any suggestion what could have caused this issue?

When analyzing the output files of MALT for the samples for which this error occurred, I could see that the error messages stops the LCA algorithm and I only see that reads were assigned to a subset of the species that I expected to be present in the sample. A large fraction of the reads remain unassigned.

Does anyone have any suggestions what’s going on here? Maybe @Daniel?


I am sorry that I missed this question… If it is still current, I can look at it if you give me access to data that causes the issue.

Thanks, @Daniel, it still is, but the database is rather large (> 200 GB after compressing it in a tarball). So I will see if I can get the same error message with a reduced database and let you know.