Meganizing stuck at Writing 80%

eliasb · November 24, 2022, 9:17am

I am trying to meganize my .daa file and it’s always getting stuck at “Writing 80%”.
The .daa file is based on using diamond on prodigal predicted genes. Prodigal was run on a metagenome assembly consisting of ca 60 environmental metagenome samples. The .daa file is 120 GB in size. I am running this using the daa-meganizer tool dedicating the job 256GB RAM. It will get stuck at Writing 80% indefinitely until I cancel the job or it times out on the computer cluster (after 10 days).

Here’s the command I use:

daa-meganizer -i prodigal.daa -mdb megan-map-Feb2022-ue.db -t 16 --only KEGG

I have tried -cs -256000 and -cs -100000 and it still gets stuck at 80% (not sure if that is the issue here).

Here’s the log:

Version MEGAN Ultimate Edition (version 6.24.5, built 13 Nov 2022)
Author(s) Daniel H. Huson
Copyright (C) 2022 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 18.0.2.1
Functional classifications to use: KEGG
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading kegg.map: 25,480
Loading kegg.tre: 68,226
Meganizing: prodigal.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (442.6s)
Writing
10% 20% 30% 40% 50% 60% 70% 80%

Here is an example of some of the sequences from the prodigal fasta file used with diamond. Each header starts with a > symbol but it doesn’t show up on this message board. I wonder if the issue might be related to the long header names? The header is one line, but shows up as two on this message board as it is too long to show on one line.

k141_0_1 # 1 # 207 # -1 # ID=1_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.420
AATCATAACGACGGAAGCGCCGGTGATAGTCTTTTTGCCAATTCCTGGACGGTGCCGCCTGTTGAAGAAAGCAACTACTATATCGACCTTCAGATTACACGTGTAGATTCGGATACCGTCGTTAATCATTTGAATAATATGGCTCTCTTTACAACAATCGGCCCGGTCGTGCTGGATAGCATTTCCTGTATAAAAACATTTACATAT
k141_13676352_1 # 3 # 416 # 1 # ID=3_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.657
GGCGGGACCTTCAGCTTCGACGAAGTGAGAGATCAGATTCGCGAAACGCTGGCAGGCGAGAAGCGGCGAGAAGCGGCCCTCGAAGCGGCCCGAAGCCAGTGGGCCACGCTGGATACAGGCATCTCCCTCGAGGACGCGGCCGAGCGGCTCGGCTGGTCGATCGGCACGGCCGGTCCGTTCAACCGCCGACAGTTTGCAGCCGGACTCGGCCGCAACACCGAAGCCATCGGAGCGGCATTTGCAGCCCCCGTGGGGCAAGCCGTCGGTCCCCTGAACGCGGACGACGCGGTCGTATTTCTGCGGGTGGACGACCGTACACAGGCGAATCCCGAGTTGTTCGTGGCCGTCCGGGAGCAGCTCAGATCGCAGATGCAGATGCAGGCGTCGCAGGCGAACGTCAATAACTGGATCGAG
k141_3419088_1 # 1 # 129 # -1 # ID=4_1;partial=10;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.674
ATGCTGATTCGTGCTCTTGGAGTCGTCGGCGTCGTGAGTCTGGTCACAATGGCCGCAGTCGCCACAGGGCGCGATGGCCTGACAGGACAGGCCCAGCAGGGCCCGGCGTACGACTCCGCTCGCGCCTGG

Do you know why it might get stuck at Writing 80%? Is there something specific happening at this step?

Daniel · November 30, 2022, 8:21am

One guess is that you might be out of memory? The Java part of MEGAN uses up to as much memory as you specified during installation, but the SQLITE database access that reads the mapping DB uses additional memory. I’m still working on this, trying to figure out how to best control how much memory is used there. Did you, or could you, check how much memory the program is using in total, when it gets stuck… Thank you

eliasb · November 30, 2022, 8:49am

Hi Daniel,

The jobs I dedicated 256 GB RAM used around 110 GB RAM. I have also tried to meganize my prodigal file using a job with 1 TB RAM, but it still gets stuck at “Writing 80%” and eventually times out. That was a few months ago and I am unable to retrieve information on how much memory that job used.

ralf_m · June 6, 2023, 7:20am

Dear Daniel,

Not sure whether this is a bug report or a user question, but I have an almost identical issue at the moment:

daa-meganizer is stuck at 90% writing; here are the last few lines from the meganizer log:

Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (504.0s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90%

After that, meganizer runs “forever” with some load by Java processes on the server, but apparently without moving forward.

I am working on 144 samples (MEGAHIT assembled PE short reads, gene prediction with Prodigal, alignment to nr with diamond). I have successfully used the same pipeline before on 60 metagenome samples in another, very similar project. In the current project, some daa files meganize successfully, some get stuck at 90% writing.

In each case, DIAMOND finishes without complaints (diamond log looks good) and I can “diamond view” all daa files. The daa files are around 30 GB, give or take.

Memory should not be an issue (-Xmx128000M), as it is never fully occupied by daa-meganizer. I also tried with 256 GB and 512 GB on two different platforms:
Fedora 37, kernel 6.2.14-200.fc37.x86_64, Intel CPU, 1 TB RAM, MEGAN CE 6.24.22
Ubuntu 22.04.2 LTS, kernel 5.15.0-73-generic, AMD Epyc CPU, 512 GB RAM, MEGAN CE 6.24.23

To narrow down this issue, I tried to compare successful and failed samples starting from the gene-predicted faa files, but could not find obvious differences (fasta headers and sequence data look reasonable, ran diamond again and thoroughly checked the log, made sure I can view the daa files, etc.)

At this point, I am lost but itching to know what exactly happens at 90% writing.

How can I further narrow down this problem?
What other information/data can I provide to solve this?

Best,
Ralf

Solala · June 7, 2023, 9:24am

Hello Ralf, hello Daniel,

i’m currently stuck at the exact same issue!

Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (19,521.5s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% /path/to/megan/tools/daa-meganizer: line 44: 202082 Killed                  $java $java_flags --module-path=$modulepath --add-modules=megan megan.tools.DAAMeganizer $options
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=102177.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Some of my files are also getting stuck at 90% during writing, while others are meganized without problem. And it’s not even the largest files that fail, but it’s persistently the same ones.
I monitored a few process closely, and at the time they fail the memory consumption was about 110 Gb from 180Gb available memory. The successful samples report peak memory between 40 and 100 GB.

My daa files have around 20GB and i’m using Megan CE v6.24.23, but i also tried meganizing with version 6.21.1 with the same outcome

These are the current settings for daa-meganizer

Mode
	--longReads: false
Parameters
	--classify: true
	--minScore: 70.0
	--maxExpected: 0.01
	--minPercentIdentity: 0.0
	--topPercent: 10.0
	--minSupportPercent: 0.01
	--minSupport: 1
	--minPercentReadCover: 0.0
	--minPercentReferenceCover: 0.0
	--minReadLength: 0
	--lcaAlgorithm: naive
	--lcaCoveragePercent: 100.0
	--readAssignmentMode: readCount
Classification support:
	--mapDB: /beegfs/common/data/MIC_databases/megan/megan-map-Feb2022.db
Deprecated classification support:
	--parseTaxonNames: true
	--firstWordIsAccession: true
	--accessionTags: gb| ref|
Other:
	--threads: 20
	--cacheSize: -10000
	--propertiesFile: /home/wende/.MEGAN.def
	--verbose: true
Version   MEGAN Community Edition (version 6.24.23, built 9 May 2023)
...
Java version: 20.0.1; max memory: 180G

Same as Ralf, i’m happy to provide additional information and thankful for any tip or help

Best,
Sonja

ralf_m · June 12, 2023, 2:53pm

Hi Sonja and Daniel,

Some additional information:

daa2info works on daa files where daa-meganizer failed (37 out of 144 samples; although it obviously can not show any assignments)
daa2rma also makes assignments based on the megan-map mapping file (I was not aware of this)
daa2rma seems to work where daa-meganizer fails (tried on a “failed” sample)
the number of assignments is almost as high as with daa-meganizer (compared with a successfully meganized sample)
for my pipeline, this looks suitable since I am just extracting assignments (paths and accession numbers) and stitching together count tables after that
rma2info seems to provide the same arguments for that as daa2info
no need to alter my pipeline and scripts too much, which is good
I will now start the processing of all 144 samples using dma2rma instead of daa-meganizer
I am still curious to know what is going on with these 37 “failed” samples or daa-meganizer (and happy to provide additional information)

I hope this information is also helpful for you, @Solala, to move on in your pipeline.

Best,
Ralf

Solala · June 16, 2023, 8:40am

Hey Ralf,

many thanks for sharing your insights!
I’m trying now to use daa2rma, but it’s as expected very slow and also seems to be using less cpus than i provided.
But that sounds like a good workaround if it ever finishes.
Otherwise i think i’ll just subsample before diamond.

Best, Sonja

Mayur · June 23, 2023, 4:53am

I don’t know if this might help but here is how much memory was used when it was stuck.

$ top
top - 10:10:17 up 18 days, 12 min, 1 user, load average: 17.99, 17.97, 16.48
Tasks: 477 total, 7 running, 470 sleeping, 0 stopped, 0 zombie
%Cpu(s): 89.5 us, 1.5 sy, 0.0 ni, 9.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 65708672 total, 5848156 free, 59584708 used, 275808 buff/cache
KiB Swap: 67042300 total, 34280684 free, 32761616 used. 5686716 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17234 tgangar 20 0 72.2g 40.2g 1960 S 887.1 64.1 2031:31 java

RunTime=11:33:51 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2023-06-22T10:10:09 EligibleTime=2023-06-22T10:10:09
AccrueTime=2023-06-22T10:10:09
StartTime=2023-06-22T22:36:52 EndTime=2023-06-27T22:36:52 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-06-22T22:36:52
Partition=standard AllocNode:Sid=gpu-login:27403
ReqNodeList=(null) ExcNodeList=(null)
NodeList=cn[005,047,087,125]
BatchHost=cn005
NumNodes=4 NumCPUs=40 NumTasks=40 CPUs/Task=1 ReqB:S:C:T=0:0::
TRES=cpu=40,node=4,billing=40
Socks/Node=* NtasksPerN:B:S:C=10:0:: CoreSpec=*
MinCPUsNode=10 MinMemoryNode=60G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/scratch/user/meganizer_test.sh
WorkDir=/scratch/user
StdErr=/scratch/user/job.%J.err
StdIn=/dev/null
StdOut=/scratch/user/job.%J.out
Power=

Version MEGAN Community Edition (version 6.24.23, built 9 May 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.1; max memory: 50G
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: …/Compost_R1_pair.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (9,534.6s)
Writing
10% 20% 30% 40% 50% 60% 70% 80%

Mayur · June 23, 2023, 10:59am

Tried for second time:

Version MEGAN Community Edition (version 6.24.23, built 9 May 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.1; max memory: 50G
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: …/Compost_R1_pair.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (12,142.7s)
Writing
10% 20% 30% 40% 50% 60% 70% 80%

Daniel · August 31, 2023, 1:12pm

This has finally been fixed, see release 6.25.1, which I will upload later today

luis97 · January 29, 2024, 8:51am

Dear all,

I am trying to Meganize two daa files with 21.6 and 21.7 G each, so I set a maximum memory for Megan of 500G. However, the process keps getting killed during the annotation and sometimes during the beginning of the writing phase. I have been trying with both 6.25.9 and 6.25.7. Here is an example log of my last run: Version MEGAN Community Edition (version 6.25.7, built 1 Dec 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.2; max memory: 500G
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: LM002_AF_A_diamond_alignment.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90%
Thank you very much.
Best regards,
Luis

hmonteiro · February 8, 2024, 2:00am

Hi everyone, I deleted my last comment because I was finally able to run the meganizer. The total memory it used for a 28GB .daa file was 78G, so increasing the Java heap space to 128G worked. Also increased the number of threads to -t 256 and cores to 32 to avoid any problems.