Malt-build ArrayIndexOutOfBoundsException

Dear MEGAN community,

I have an error at malt-build step and currently I have no clue what can be the problem and I would really appreciate your help.

The script that I use is this:

malt-build -i /MALT/FASTA/*fna.gz -d /MALT/FASTA/OUT --sequenceType DNA --threads 10 --mapDB /NCBI_TAXONOMY_MALT/megan-nucl-Jan201.db --verbose

And this is the log file:

MaltBuild - Builds an index for MALT (MEGAN alignment tool)

Options:
Input:
–input: /MALT/FASTA/GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna.gz /MALT/FASTA/GCF_000001405.39_GRCh38.p13_genomic.fna.gz … (There are 722 *fna.gz files as total)…
–sequenceType: DNA

Output:
–index: index

Performance:
–threads: 10
–step: 1

Seed:
–shapes: default
–maxHitsPerSeed: 1000

Classification support:

Deprecated classification support:
–parseTaxonNames: true
–mapDB: /NCBI_TAXONOMY_MALT/megan-nucl-Jan201.db
–noFun: false

Other:
–firstWordIsAccession: true
–accessionTags: gb| ref|
–firstWordOnly: false
–random: 666
–hashScaleFactor: 0.9
–buildTableInMemory: true
–extraStrict: false
–verbose: true

Version MALT (version 0.5.3, built 4 Aug 2021)
Author(s) Daniel H. Huson
Copyright © 2021 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.

Classifications to use: Taxonomy
Reference sequence type set to: DNA
Seed shape(s): 111110111011110110111111
Deleting index files: 3
Number input files: 722

Loading FastA files:
100% (0.1s)

Caught:
java.lang.ArrayIndexOutOfBoundsException: Index -70 out of bounds for length 127
at malt/malt.data.DNA5.getNormalized(DNA5.java:64)
at malt/malt.io.FastAFileIteratorBytes.next(FastAFileIteratorBytes.java:159)
at malt/malt.data.ReferencesDBBuilder.loadFastAFile(ReferencesDBBuilder.java:179)
at malt/malt.data.ReferencesDBBuilder.loadFastAFiles(ReferencesDBBuilder.java:162)
at malt/malt.MaltBuild.run(MaltBuild.java:269)
at malt/malt.MaltBuild.main(MaltBuild.java:70)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at com.install4j.runtime/com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:84)
at com.install4j.runtime/com.install4j.runtime.launcher.UnixLauncher.start(UnixLauncher.java:66)
at install4j.malt.MaltBuild.main(Unknown Source)

It breaks at the step of loading the files, so I have an assumption, that there can be some broken FASTA files in my folder, but unfortunately I cannot get it from the error.

The memory parameter is 1000GB. The size of all FASTA files is 32 GB.

Best,
Vasilina

Sorry for disturbing, the problem really was with FASTA files.

Could you post what is wrong with the FASTA files? I am also having the same index error for the NCBI nt database. I downloaded all the nt.*.tar.gz files, and when I run malt-build, i get ArrayIndexOutOfBoundsException error. Thnaks. Balaji.

Hi Balaji,

It appeared that when we downloaded multiple files with wget, it did not work well for a lot of files, the downloading was not finished and the compression was wrong. You can check all the files in your folder with this command, for example: gunzip -t filename, which reports if a file is not compressed correctly. We redownloaded the broken files after checking.
Hope it will work for you.

Best,
Vasilina