How to create default accession database?

Dear MEGAN community,

In the Megan Ultimate version there is a “create-accession-db”
application, but I can’t find a description of it in the manual, only
the CLI help* gives some hints, but not all parameters are clear. I
would appreciate it if you could illustrate the functionality of this
util with an example, because without mapping database updates it would
be quite cumbersome to use the Ultimate version, also which is very
handy and powerful software.

*Input
-c, --classifications [string(s)] List of names of
classifications. Mandatory option.
-i, --input [string(s)] List of input .map files for
classifications. Mandatory option.
-inf, --infoFiles [string(s)] List of input .info files for
classifications.
-ue, --onlyUltimateEdition [string(s)] Classifications only for
Ultimate Edition.
-n, --nr [string] NR database file (usually
nr.gz, downloaded from NCBI). Mandatory option.
-xm, --extendedMapping Compute the extended mapping
in which every accession occurs (instead of just the first per
sequence). Default value: false.

Output
-od, --outputDir [string] Output directory for mapping
files. Default value: …
-os, --outputSuffix [string] Output mapping files suffix.
Default value: -X1.gz.
-o, --output [string] Output DB file.
-ou, --outputUltimate [string] Output DB file for MEGAN UE.
-cmf, --createMappingFiles Create the mapping files.
Default value: true.
-cdf, --createDBFiles Create the database files
(assumes mapping files already exist). Default value: true.

Options
-lca, --useLCA [string(s)] Apply LCA to classifications
listed here. Default value(s): ‘Taxonomy’ ‘GTDB’.
-supp, --supportedOnly Only allow classification
names supported by MEGAN. Default value: true.

The options are clear but we had some issues during usage.

Thank you!
Gaboca

I plan to provide an update in the near future (end of October).

Here is how I run the program:

megan6ue/tools/utils/create-accession-db
-c Taxonomy GTDB SEED KEGG
-i
/Users/huson/data/db/TAXONOMY.map
/Users/huson/data/db/GTDB.map
/Users/huson/data/db/SEED.map
/Users/huson/data/db/KEGG.map
-o /Users/huson/data/db/map.db
–nr /Users/huson/data/db/nr.gz
-ue KEGG

Each of the .map files contains a mapping from reference sequence accessions to an ID in the corresponding classification.

Thank you for the detailed description, I also tested the “create-accession-db” script (I just want to create a taxonomic database only), but the following SQL error message jams the process at the merging step:

org.sqlite.SQLiteException: [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)

Detailed log:

./create-accession-db -c Taxonomy -i accessionmap202309.map -o accessionmap202309.db --nr nr.faa -v --threads 32
CreateMappingDB - Create MappingDB files for MEGAN
Version MEGAN Ultimate Edition (version 6.25.2, built 13 Sep 2023)

Java version: 20.0.2; max memory: 375G
Parsing input files: accessionmap202309.map
Processing file: accessionmap202309.map
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (559.7s)
Taxonomy:1,193,208,055 from file: accessionmap202309.map
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Using LCA for Taxonomy
Processing file: nr.faa
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (4,569.9s)
Taxonomy: 380,349,598
Creating mappings table: init
Merging all:
Caught:
org.sqlite.SQLiteException: [SQLITE_TOOBIG] String or BLOB exceeds size limit (statement too long)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.core.DB.newSQLException(DB.java:1179)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.core.DB.newSQLException(DB.java:1190)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.core.DB.throwex(DB.java:1150)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.core.NativeDB.prepare_utf8(Native Method)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.core.NativeDB.prepare(NativeDB.java:126)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.core.DB.prepare(DB.java:264)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.jdbc3.JDBC3Statement.lambda$execute$0(JDBC3Statement.java:51)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.jdbc3.JDBC3Statement.withConnectionTimeout(JDBC3Statement.java:454)
at org.xerial.sqlitejdbc@3.42.0.0/org.sqlite.jdbc3.JDBC3Statement.execute(JDBC3Statement.java:40)
at megan/megan.accessiondb.CreateAccessionMappingDatabase.execute(CreateAccessionMappingDatabase.java:355)
at megan6u/megan6u.tools.utils.CreateMappingDB.A(Unknown Source)
at megan6u/megan6u.tools.utils.CreateMappingDB.A(Unknown Source)
at megan6u/megan6u.tools.utils.CreateMappingDB.main(Unknown Source)