I’ve written a wrapper script that runs Megan 6 on my long read sam files. All other files complete fine, bit this one fails with the error pasted below.

The same was obtained by mapping the FASTQ to a diamond database of NCBI nr roughly following this workflow for Nanopore long reads…

Any debugging tips appreciated. Thanks!

Current SAM file: bams/aligned/NCBI_nr/barcode03.sam
Reads file:   fastqs/decontaminated/barcode03.fastq
Output file:  megan_diamond/rmas/barcode03.rma
Classifications: Taxonomy, SEED, EGGNOG, GTDB, EC, INTERPRO2GO
Generating RMA6 file Parsing matches
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file barcode03.sam
Parsing file: bams/aligned/NCBI_nr/barcode03.sam
Input domination filter: MinPercentCoverToStronglyDominate=90.0 and TopPercentScoreToStronglyDominate=90.0
10% 20% 30% Caught:
org.sqlite.SQLiteException: [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@ Method)
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@$executeQuery$1(
        at org.xerial.sqlitejdbc@
        at org.xerial.sqlitejdbc@
        at megan/megan.accessiondb.AccessAccessionMappingDatabase.getValues(
        at megan/megan.rma6.RMA6FromBlastCreator.parseFiles(
        at megan/
        at megan/
        at megan/

Hi @bioinfodonk,

Would it be possible to share this file? I also recommend using MEGAN7. You can upload the file to a drive and share the link with us.

Best regards,

Hi @Anupam,

The files are very large (~20Gb for the FASTQ and SAM each), but I’m working on it. Unfortunately using MEGAN7 isn’t currently possible in my pipeline.

Appreciate any tips you might have.

Hello @Anupam,

Here’s the link: Dropbox

Thanks @bioinfodonk, will update you soon.
