Meganizing daa file(s) fails with up-to date accession mapping file and alternate, new NCBI tre/map file with LR mode

Hi,

Meganizing daa file(s) fails with up-to date accession mapping file (abin) and an alternate, new NCBI tre/map file with long read mode:

Executing: meganize daaFile=’/home/blaize/diamond/n78-01.daa’ minScore=50.0 maxExpected=0.01 minPercentIdentity=0.0 topPercent=0.1 minSupportPercent=1.0E-9 lcaAlgorithm=longReads lcaCoveragePercent=51.0 minComplexity=0 useIdentityFilter=false readAssignmentMode=readCount fNames= longReads=true paired=false;
Meganizing file: /home/blaize/diamond/n78-01.daa
Annotating DAA file using EXTENDED mode
Error: Task java.util.concurrent.FutureTask@651f6be2[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@d7e003b[Wrapped task = megan.daa.DA…
Task java.util.concurrent.FutureTask@651f6be2[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@d7e003b[Wrapped task = megan.daa.DAAReferencesAnnotator$$Lambda$361/0x000000008049d578@3e34e3fd]] rejected from java.util.concurrent.ThreadPoolExecutor@61699025[Shutting down, pool size = 25, active threads = 21, queued tasks = 0, completed tasks = 5]
Info: Finished meganizing 1 files. ERRORS: 1

Also everything run smoothly with the old megan-map-Jan2021.db mapping file :frowning:

Any idea?

Thanks: Blaize

Supplementary information:

Accession to taxonomy disk-based mapping file creation (~16Gb):

Computing map:
Processing file: prot.accession2taxid
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (418.6s)
Building table:
(Bits: 28, buckets: 268,435,456, bucket size: 4)
Sorting map…
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1366.8s)
Writing table…
10% 20% 100% (157.7s)
(Bucket avg size: 3.7, max size: 19, used: 98%)
(Index size: 2,147,483,652, data size: 14,275,501,278)
Merging files
10% 100% (14.3s)
Opening file: acc2tax202112.abin
Size: 999,246,517
Total in: 999,246,518
Total out: 999,246,517
Total time: 1,989s
Peak memory: 185.4 of 330.1G

Taxonomy file generation:

Parsing names.dmp
Distinct taxa: 2385619
Synonyms: 1140133
Writing synonyms.map: 1140133
Parsing nodes.dmp
Nodes: 2385619
Building tree
Tree has 2385619 nodes and 2385618 edges
Writing files
Writing ncbi.tre: done
Writing ncbi.map: 2385619
Writing ncbi.lvl: done
Writing ncbi.info: done
Time: 18s

There is a tool called comes with the Ulitmate Edition called “create-accession-db” which says: create small and extended protein MappingDB files for MEGAN, but I can not find any detailed information about it. E.g. what should I enter into the classification strings option field? Thanks in advance for any information or CLI examples!

Input
-c, --classifications [string(s)] List of names of classifications. Mandatory option.
-i, --input [string(s)] List of input .map files for classifications. Mandatory option.
-inf, --infoFiles [string(s)] List of input .info files for classifications.
-ue, --onlyUltimateEdition [string(s)] Classifications only for Ultimate Edition.
-n, --nr [string] NR database file (usually nr.gz, downloaded from NCBI). Mandatory option.
-xm, --extendedMapping Compute the extended mapping in which every accession occurs (instead of just the first per sequence). Default value: false.
Output
-od, --outputDir [string] Output directory for mapping files. Default value: .
-os, --outputSuffix [string] Output mapping files suffix. Default value: -X1.gz
-o, --output [string] Output DB file.
-ou, --outputUltimate [string] Output DB file for MEGAN UE.
-cmf, --createMappingFiles Create only the mapping files. Default value: true.
-cdf, --createDBFiles Create the database files (assumes mapping files already exist). Default value: true.
Options
-lca, --useLCA [string(s)] Apply LCA to classifications listed here. Default value(s): ‘Taxonomy’ ‘GTDB’.
-supp, --supportedOnly Only allow classification names supported by MEGAN. Default value: true.
Other:
-t, --threads [number] Number of threads. Default value: 8.
-tsm, --tempStoreInMemory Temporary storage in memory for SQLITE. Default value: false.
-tsd, --tempStoreDir [string] Temporary storage directory for SQLITE (if not in-memory).
-v, --verbose Echo commandline options and be verbose. Default value: false.
-h, --help Show program usage and quit.

Here is a description of the main flags:

For each classification that you want to put into a mapping db, your
provide a classification names(-c), an input file (-i), each containing mappings of accessions to class-ids, one pair per line, tab separated, then an info file (-inf) containing a line of info about the classification, then the NCBI-nr database nr.gz (-n). Specify the output db file using -o. Probably want to set -cmf (create mapping files) to false.