NullPointerException error

Hello

It’s my first time analysing data with MEGAN. I’ve just attempted to meganize my files with daa-meganizer; however, I keep getting the following errors.

Version MEGAN Ultimate Edition (version 6.22.2, built 10 Mar 2022)
Author(s) Daniel H. Huson
Copyright © 2022 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 17.0.2
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: R4V1_2603_S1__dfopt_merged.daa.assembled_unassembledforward.fastq.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (70.7s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% Caught:
java.lang.NullPointerException
at java.base/java.lang.System.arraycopy(Native Method)
at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:129)
at megan/megan.daa.io.DAAModifier.appendBlocks(DAAModifier.java:134)
at megan/megan.daa.io.DAAModifier.appendBlocks(DAAModifier.java:147)
at megan/megan.daa.DAAReferencesAnnotator.apply(DAAReferencesAnnotator.java:245)
at megan/megan.daa.Meganize.apply(Meganize.java:60)
at megan/megan.tools.DAAMeganizer.run(DAAMeganizer.java:262)
at megan/megan.tools.DAAMeganizer.main(DAAMeganizer.java:62)
at megan6u/megan6u.tools.DAAMeganizer.main(Unknown Source)

The command I’ve been using is

for x in $(ls *.daa) ; do daa-meganizer -t 50 -i $x -mdb /Data/diamondDB/megan-map-Feb2022.db ;done 2>> megan_output.log    

I’m unsure what a NullPointerException is. Is it an issue with my installation? I’ve had a look on the forum and it seems like they are caused by a few issues, my installation and files processing are both recent, and I have 128gig Memory available
I did have to add my tools folder to the path to be able to run daa-meganizer using…

export PATH=$PATH:/localhome/usersoftware/megan/tools

The first few files in my loop threw up the NullPointerException within a couple of minutes however the file I am currently processing has been going for about 30 now without throwing up the issue, which makes me wonder if there is a problem with the pre-processing on some of my files?

I’d appreciate any guidance on the issue.

Thanks

I think I have resolved one of my original issues reading a bit more into other posts that MEGAN is limited by memory in vmoptions.

I now seem to be getting a new issue, I assume the issue may be related to the fact there were now assignments made by meganizer ?

Version MEGAN Ultimate Edition (version 6.22.2, built 10 Mar 2022)
Author(s) Daniel H. Huson
Copyright © 2022 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 17.0.2
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: R4V1_2603_S1__dfopt_merged.assembled.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (33.7s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1.9s)
Binning reads Initializing…
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using ‘Naive LCA’ algorithm for binning: GTDB
Using Best-Hit algorithm for binning: EC
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (521.8s)
Total reads: 14,903,697
With hits: 14,903,697
Alignments: 353,505,901
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. GTDB: 0
Assig. EC: 4,350,329
Assig. INTERPRO2GO: 0
MinSupport set to: 1490
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (3.8s)
Binning reads Syncing
100% (0.0s)
Caught:
java.lang.NullPointerException: Cannot invoke “megan.data.IClassificationBlock.getKeySet()” because “classificationBlock” is null
at megan/megan.daa.connector.DAAConnector.getClassificationSize(DAAConnector.java:115)
at megan/megan.algorithms.DataProcessor.apply(DataProcessor.java:413)
at megan/megan.core.Document.processReadHits(Document.java:545)
at megan/megan.daa.Meganize.apply(Meganize.java:97)
at megan/megan.tools.DAAMeganizer.run(DAAMeganizer.java:262)
at megan/megan.tools.DAAMeganizer.main(DAAMeganizer.java:62)
at megan6u/megan6u.tools.DAAMeganizer.main(Unknown Source)

Megan is not finding any assignments.

There are two possible issues:

  1. there could be something wrong with the mapping db file.
    Could you please try opening it using the command line program sqlite3 like this:

sqlite3 megan-map-Feb2022-ue.db

and then type:

select * from info;
.quit

Does that work?

If it does, then the question is
(2) which database did you align against? Which mapping db file are you using (e.g. megan-map-Feb2022-ue.db?). If there is a mismatch between the database against which you aligned the sequences and the mapping db file that you are using, then this would explain why there are 0 assignments.

1 Like

Hi,
I have the same problem. I did what you suggested and the information looks like this:

sqlite3 megan-map-Feb2022-ue.db
SQLite version 3.29.0 2019-07-10 17:32:03
Enter “.help” for usage hints.
sqlite> select * from info;
general|Created 2022-02-18 06:31:36_ue|346290401
Taxonomy|created: Tue Feb 08 15:30:56 CET 2022 Cite: Benson et al (2005) NAR 33 D34–38.|326245633
GTDB|created: Tue Jul 27 17:36:00 CET 2020 Cite: Parks et al (2018) Nature Biotech, 36:996-1004.|127440021
EGGNOG|created: Fri Jan 28 15:50:07 CET 2022 Cite: Powell et al (2014) NAR 42 D231-239.|7284398
INTERPRO2GO|created: Fri Feb 11 10:56:38 CET 2022 Cite: Mitchell et al (2015) NAR 43 D213-221. The Gene Ontology Consortium (2015) NAR 43 D1049-1056.|30548439
SEED|created: Sat Feb 12 16:42:06 CET 2022 Cite: Overbeek et al (2014) NAR 44 D206-214.|60450877
EC|created: Wed Jan 26 12:57:41 CET 2022 Cite: Bairoch (2000) NAR 28(1):304-5|25017650
KEGG|created: Mon Feb 07 12:54:04 CET 2022 Cite: Kanehisa et al (2014) NAR 42 D199-205.|52217159
sqlite> .quit

The alignment database was made like this:

diamond makedb --in /nr.gz -d /NCBI_nr_2023_conda --threads 40

The blast-files like this:

diamond blastx -d /NCBI_nr_2023_conda -q ${SAMPLENAME}* -o ${SAMPLENAME}.50hits_nr.daa -f 100 -k 50 -e 0.01 -p 20 --max-hsps 1 -b 2 -c 4

and I get this error:

java.lang.NullPointerException: Cannot invoke “megan.data.IClassificationBlock.getKeySet()” because “classificationBlock” is null

Would be greatful for suggestions - thank you !

The full log:

(base) [dahl@caput TBA_logs]$ less megan_LCA-90440.err
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, KEGG, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading kegg.map: 100,399
Loading kegg.tre: 106,785
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: /mnt/beegfs/micro/MDahl/Seasonal_Forhot/TBA_out/LSU_mRNA/Sort_mRNA/nonLSU/diamond/S404_S17_overlapped_minlen100_LSU_mRNA.fastq.fastq.50hits_nr.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (123.5s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (5.8s)
Binning reads Initializing…
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using Best-Hit algorithm for binning: KEGG
Using ‘Naive LCA’ algorithm for binning: GTDB
Using Best-Hit algorithm for binning: EC
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (512.2s)
Total reads: 96,474
With hits: 96,474
Alignments: 3,152,516
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 18,426
Assig. KEGG: 0
Assig. GTDB: 70,602
Assig. EC: 23,807
Assig. INTERPRO2GO: 33,734
MinSupport set to: 9
Binning reads Applying min-support & disabled filter to GTDB…
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.3s)
Min-supp. changes: 2,783
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.7s)
Binning reads Syncing
100% (0.2s)
Caught:
java.lang.NullPointerException: Cannot invoke “megan.data.IClassificationBlock.getKeySet()” because “classificationBlock” is null
at megan/megan.daa.connector.DAAConnector.getClassificationSize(DAAConnector.java:120)
at megan/megan.algorithms.DataProcessor.apply(DataProcessor.java:413)
at megan/megan.core.Document.processReadHits(Document.java:548)
at megan/megan.daa.Meganize.apply(Meganize.java:97)
at megan/megan.tools.DAAMeganizer.run(DAAMeganizer.java:255)
at megan/megan.tools.DAAMeganizer.main(DAAMeganizer.java:59)

I’m having a difficult time figuring out what the problem is.
The new release that I will upload later today should print out for which classification the problem occurs.
If you have time, please rerun with the new release 6.25.4 and then let me know what the additional error message is.

Thank you @Daniel I will update megan on our server and try again !
I will get back to you on this threads with the output.

I am affraid the same error occurs, here is the log:

less megan_LCA-90638.err
Version MEGAN Community Edition (version 6.25.4, built 24 Oct 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.2; max memory: 31.3G
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, KEGG, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading kegg.map: 100,399
Loading kegg.tre: 106,785
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: /mnt/beegfs/micro/MDahl/Seasonal_Forhot/TBA_out/LSU_mRNA/Sort_mRNA/nonLSU/S429_S4_overlapped_minlen100_LSU_mRNA.fastq.fastq.50hits_nr.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (164.6s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (5.8s)
Binning reads Initializing…
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using Best-Hit algorithm for binning: KEGG
Using ‘Naive LCA’ algorithm for binning: GTDB
Using Best-Hit algorithm for binning: EC
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (381.9s)
Total reads: 623,227
With hits: 623,227
Alignments: 19,554,500
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. KEGG: 0
Assig. GTDB: 0
Assig. EC: 29,623
Assig. INTERPRO2GO: 0
MinSupport set to: 62
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.7s)
Binning reads Syncing
100% (0.1s)
Error: getClassificationSize(Taxonomy): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. Taxonomy: 0
Error: getClassificationSize(SEED): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. SEED: 0
Error: getClassificationSize(EGGNOG): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. EGGNOG: 0
Error: getClassificationSize(KEGG): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. KEGG: 0
Error: getClassificationSize(GTDB): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. GTDB: 0
Error: getClassificationSize(EC): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. EC: 0
Error: getClassificationSize(INTERPRO2GO): classificationBlock is null
Known classifications: EC, EC, OG, DB, GO, GG, ED
Class. INTERPRO2GO: 0
Total time: 561.3s
Peak memory: 17.4 of 31.3G
rm: cannot remove ‘/tmp/systemd-private-65a2cd02c0254ffb8f748c5795ed4738-chronyd.service-8sO2QW’: Operation not permitted
rm: cannot remove ‘/tmp/systemd-private-65a2cd02c0254ffb8f748c5795ed4738-cups.service-FlUj1m’: Operation not permitted

1 Like

@MDahl let me know when you figure out the full taxonomic table export. Are you trying to export a taxonomic table as well?

@Daniel Could you let me know if you are looking into this issue, because it’s a bit of a dead-end for me, I am not sure what I can change to make it work…
It would be super valuable for me if it would work, so I will be very greatful for your effort !

I was told that you need to Ultimate Edition to be able to meganize a .daa file via command line.

oh… Thanks for letting me know.

If you send me the input file, I will run it in my debugger and will figure out what the issue is. (It is not a CE vs UE issue).

Thank you very much for your offer !
I have sent three examples to your offical email adress (Uni. Tübingen) via WeTransfer, 3x compressed Diamond blast files and 3x compressed corresponding magnized files.
Be aware that the file names are the same in the to archives (but not the same size, the meganized files are smaller), sorry for that inconvenience.

I also want to let you know, that I have also been trying to obtain the taxonomy via the inbuild LCA directly from Diamond, and have been communicating with Benjamin Buchfink on this, I believe he is a member of your working group, right?
From my perspective it seems both workflows (LCA via Diamond or Diamond → megan) would give me the wanted output.

PS. you can also reach me at: dahlm(at)uni-greifswald(dot)de