Sam2RMA fails with String or BLOB exceeds size limit


I’ve written a wrapper script that runs Megan 6 on my long read sam files. All other files complete fine, bit this one fails with the error pasted below.

The same was obtained by mapping the FASTQ to a diamond database of NCBI nr roughly following this workflow for Nanopore long reads…

Any debugging tips appreciated. Thanks!

Current SAM file: bams/aligned/NCBI_nr/barcode03.sam
Reads file:   fastqs/decontaminated/barcode03.fastq
Output file:  megan_diamond/rmas/barcode03.rma
Classifications: Taxonomy, SEED, EGGNOG, GTDB, EC, INTERPRO2GO
Generating RMA6 file Parsing matches
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file barcode03.sam
Parsing file: bams/aligned/NCBI_nr/barcode03.sam
Input domination filter: MinPercentCoverToStronglyDominate=90.0 and TopPercentScoreToStronglyDominate=90.0
10% 20% 30% Caught:
org.sqlite.SQLiteException: [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@ Method)
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@$executeQuery$1(
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at megan/megan.accessiondb.AccessAccessionMappingDatabase.getValues(
        at megan/megan.rma6.RMA6FromBlastCreator.parseFiles(
        at megan/
        at megan/
        at megan/

Hi @bioinfodonk,

Would it be possible to share this file? I also recommend using MEGAN7. You can upload the file to a drive and share the link with us.

Best regards,

Hi @Anupam,

The files are very large (~20Gb for the FASTQ and SAM each), but I’m working on it. Unfortunately using MEGAN7 isn’t currently possible in my pipeline.

Appreciate any tips you might have.

Hello @Anupam,

Here’s the link: Dropbox

Thanks @bioinfodonk, will update you soon.


Hello @Anumpan, any chance you’ve found a solution? Thank you.

Hi @bioinfodonk,

Sorry for the delay! I’ve been caught up with some work, but I’ll look into it soon and get back to you. Thanks for your patience!


Hi @bioinfodonk,

I ran the provided SAM file on my server using MEGAN6 Ultimate Edition and was able to reproduce the error on my end. However, I noticed that the FASTQ file you provided appears to be a BAM file. If you could send the raw FASTQ file, I’d be happy to check it again for you.

Regarding the issue, I believe it stems from DIAMOND reporting a large number of alignments per read—a common occurrence with long-read alignments due to the vast number of entries in the NCBI-nr database. Here are a couple of suggestions to handle this:

  1. Use a DAA file instead of a SAM file:

    • DAA files are optimized for MEGAN and generally more efficient to process.
  2. Limit the number of reported alignments per read:

    • Instead of using --top 5 in your DIAMOND command, try:
      -F 5000 --range-culling -k 25
    • This will report the top 25 alignments per read rather than the top 5%, helping MEGAN process the file without overwhelming the system.

The error occurs because MEGAN queries the SQLite database, and with a large number of alignments, more accession lookups are required, which can strain the system. Limiting the reported alignments can help avoid this issue.

I also noticed that you’re using a high frameshift penalty. Is there a specific reason for this setting?

If you prefer to continue working with the SAM format, we can explore other solutions, but I believe using the -k 25 option should resolve the issue (with SAM format too).

Let me know your preference, and I’ll be happy to assist further!

Best regards,


Thank you for looking into this! I do prefer to use SAM if possible. The -F 5000 was actually an incorrectly copied paramater (I was following a PacBio tutorial whereas I am have nanopore). I was actually using -F 15.

I will try with -k 25 instead of -top 5 and report back!