I am trying to meganize my .daa file and it’s always getting stuck at “Writing 80%”.
The .daa file is based on using diamond on prodigal predicted genes. Prodigal was run on a metagenome assembly consisting of ca 60 environmental metagenome samples. The .daa file is 120 GB in size. I am running this using the daa-meganizer tool dedicating the job 256GB RAM. It will get stuck at Writing 80% indefinitely until I cancel the job or it times out on the computer cluster (after 10 days).
Here’s the command I use:
daa-meganizer -i prodigal.daa -mdb megan-map-Feb2022-ue.db -t 16 --only KEGG
I have tried -cs -256000 and -cs -100000 and it still gets stuck at 80% (not sure if that is the issue here).
Here’s the log:
Version MEGAN Ultimate Edition (version 6.24.5, built 13 Nov 2022)
Author(s) Daniel H. Huson
Copyright (C) 2022 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 184.108.40.206
Functional classifications to use: KEGG
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading kegg.map: 25,480
Loading kegg.tre: 68,226
Annotating DAA file using FAST mode (accession database and first accession per line)
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (442.6s)
10% 20% 30% 40% 50% 60% 70% 80%
Here is an example of some of the sequences from the prodigal fasta file used with diamond. Each header starts with a > symbol but it doesn’t show up on this message board. I wonder if the issue might be related to the long header names? The header is one line, but shows up as two on this message board as it is too long to show on one line.
k141_0_1 # 1 # 207 # -1 # ID=1_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.420
k141_13676352_1 # 3 # 416 # 1 # ID=3_1;partial=11;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.657
k141_3419088_1 # 1 # 129 # -1 # ID=4_1;partial=10;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.674
Do you know why it might get stuck at Writing 80%? Is there something specific happening at this step?