SQLITE_TOOBIG error from daa2rma

Hello,

I am processing meta genome assembly using below commands:
diamond blastx -p 64 -d nr.dmnd -q RIMA /RIMA_400bp.fna --long-reads -f 100 --out RIMA/RIMA.blast.daa
daa2rma -i RIMA/RIMA.blast.daa -o RIMA.rma --paired false -lg true -mdb ./public_db/megan-map-Feb2022.db -t 64 -ram readCount -supp 0
and got the following error:

Version   MEGAN Community Edition (version 6.24.22, built 12 Apr 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map:     8,200
Loading ec.tre:     8,204
Loading eggnog.map:    30,875
Loading eggnog.tre:    30,986
Loading gtdb.map:   240,103
Loading gtdb.tre:   240,107
Loading interpro2go.map:    14,242
Loading interpro2go.tre:    28,907
Loading seed.map:       961
Loading seed.tre:       962
In DAA file:  /flash/HusnikU/Jinyeong/decontamination_RIMA/diamond_megan/RIMA/RIMA.blast.daa
Output file:  RIMA.rma
Classifications: Taxonomy, SEED, EGGNOG, GTDB, EC, INTERPRO2GO
Generating RMA6 file Parsing matches
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file RIMA.blast.daa
Parsing file: /flash/HusnikU/Jinyeong/decontamination_RIMA/diamond_megan/RIMA/RIMA.blast.daa
Input domination filter: MinPercentCoverToStronglyDominate=90.0 and TopPercentScoreToStronglyDominate=90.0
Caught:
org.sqlite.SQLiteException: [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.core.DB.newSQLException(DB.java:1135)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.core.DB.newSQLException(DB.java:1146)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.core.DB.throwex(DB.java:1106)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.core.NativeDB.prepare_utf8(Native Method)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.core.NativeDB.prepare(NativeDB.java:122)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.core.DB.prepare(DB.java:264)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.jdbc3.JDBC3Statement.lambda$executeQuery$1(JDBC3Statement.java:75)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.jdbc3.JDBC3Statement.withConnectionTimeout(JDBC3Statement.java:429)
	at org.xerial.sqlitejdbc@3.39.3.0/org.sqlite.jdbc3.JDBC3Statement.executeQuery(JDBC3Statement.java:73)
	at megan/megan.accessiondb.AccessAccessionMappingDatabase.getValues(AccessAccessionMappingDatabase.java:222)
	at megan/megan.rma6.RMA6FromBlastCreator.parseFiles(RMA6FromBlastCreator.java:257)
	at megan/megan.tools.DAA2RMA6.createRMA6FileFromDAA(DAA2RMA6.java:361)
	at megan/megan.tools.DAA2RMA6.run(DAA2RMA6.java:327)
	at megan/megan.tools.DAA2RMA6.main(DAA2RMA6.java:67)

What had happened here and how to resolve it? Thank you for your help!

Sincerely,

Cong

Could you please try using the program daa-meganizer rather than daa2rma and let me know whether the problem still occurs there.

I have looked into this. I have decreased the number of items fetched from the mapping database per call from 10000 to 5000. If the problem persists even when using this smaller value, in the next release, you can set the number of items fetched per call to an even smaller number by adding a line like

AccessionChunkSize=1000

to the properties file Megan.def.

I will upload a new release today.

Hi Daniel,

Thank you! I am using MEGAN on cluster and I have to do everything in command lines, so it is tricky to use daa-meganizer. Looking forward to the new release.

Sincerely,

Cong

Hi Daniel,

Thank you for your effort. I tested one dataset that crashed yesterday. daa2rma & rma2info still failed even when I set AccessionChunkSize=1, but daa-meganizer + daa2info worked.

Sincerely,

Cong

Please don’t use option this when running DIAMOND:

Rather, you should specify:

–range-culling -k 25 -F 15

This will run a long-read alignment and report at most 25 alignments in any region.

Using --long-reads is a short cut for

–range-culling --top 10 -F 15

and the problem with this is that DIAMOND may return thousands of alignments for any given region of the long read, leading to potentially huge files and downstream performance issues.