Memory limits of Malt-build

Hello,

I am trying to build a malt index for all NCBI bacterial genomes (~145000 files of > 500 GB). Is this possible with Malt? I got a Java message out of memory, but I had allocated 900 GB of RAM when setting it up on our server. It started to create a massive index file before failing, so I do not think that my server would have the capacity even if it were possible. If not possible, can you recommend a set of bacterial genome references that would be good?

Along this same line, is it possible to build a malt index of NCBI nr or nt? Is this too much? Is Malt just designed for subsets of NCBI db?

Sorry for all the questions. I am keen to get started using Malt.

Hugh

Dear Hugh,

MALT uses a hash-based index of all sequences and unfortunately, that does not work well with the size of today’s reference databases.
For very large databases, for DNA-protein or protein-protein alignment, I recommend that you use DIAMOND and for DNA-DNA alignment, I recommend that you use minimap2.