MEGAN CE: daa-meganizer using only the older default ncbi.tre / ncbi.map

blaize · April 26, 2024, 7:53am

Even if we load the newest, alternate taxonomy files (generated with taxdmp2tree tool) into Megan CE, the daa-meganizer CLI command still insists on the default, older taxonomy data. Also, meganizing via GUI with the updated NCBI tree works smoothly. Unfortunately the tricks used in the Create-accession-db don’t work this time
What could be the solution?
Thanks in advance!

History:
/home/ngs_lab/megan/tools/daa-meganizer -i /home/ngs_lab/temp/*.daa -lg -top 0.1 -sup 1 -lcp 51 -ram readCount -alg longReads -a2t /home/ngs_lab/DB/Megan/accessionmap202402-virus.map -t 32 -v
Meganizer - Prepares (‘meganizes’) a DIAMOND .daa file for use with MEGAN

Java version: 20.0.2; max memory: 193.4G
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading file: /home/ngs_lab/DB/Megan/accessionmap202402-virus.map

Meganizing: /home/ngs_lab/temp/UNSGM-DryLab-2024-02.daa
Meganizing init
Annotating DAA file using EXTENDED mode
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1.3s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0s)
Binning reads Initializing…
Initializing binning…
Using ‘Interval-Union-LCA’ algorithm (51.0 %) for binning: Taxonomy
Binning reads…
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.9s)
Total reads: 1,220
With hits: 929
Alignments: 3,042
Assig. Taxonomy: 782
Binning reads Applying min-support & disabled filter to Taxonomy…
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.3s)
Min-supp. changes: 0
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0s)
Binning reads Syncing
100% (0.0s)
Class. Taxonomy: 288

gregorykrice · June 20, 2024, 5:33pm

Blaize,

Does the user that is running daa-meganizer have a .MEGAN.def file that points to the updated ncbi.map and ncbi.tre files?

What happens if you explicitly specify with daa-meganizer --properties /path/to/.MEGAN.def?

Best,
Greg

blaize · June 21, 2024, 8:42am

Thanks for the idea, I tried, but unfortunately it doesn’t work despite the fact that the new taxonomy path is in the .MEGAN.def file

CLI: ./daa-meganizer -i /home/ngs_lab/temp/*.daa -lg -top 0.1 -sup 1 -lcp 51 -ram readCount -alg longReads -a2t /home/ngs_lab/DB/Megan/accessionmap202402-virus.map -t 32 -v -P /home/ngs_lab/.MEGAN.def

Cheers: Blaize

gregorykrice · June 21, 2024, 4:24pm

In /home/ngs_lab/.MEGAN.def

MappingFile=/path/to/updated/ncbi.map
TaxonomyFile=/path/to/updated/ncbi.tre

Can you re-run taxdmp2tree with a recent taxdmp.zip sending the tree output to a new location (with -t) and point MappingFile and TaxonomyFile in .MEGAN.def to that?

Anupam · June 23, 2024, 7:38pm

I missed this one. What are the names of your new files?

If everything set up properly:-

When you run daa-meganizer, it should display:

./MEGAN_UI/tools/daa-meganizer:
Loading additional classification ...... from:

adjusting .MEGAN.def might not be needed.

blaize · June 26, 2024, 10:00am

Unfortunately, the problem still persists (both of Megan CE/UI versions), even though I regenerate/rename the classification files, moved them to a new folder, restarted the server, but no effect.

CLI: ./daa-meganizer -i *.daa -lg -top 0.1 -sup 1 -lcp 51 -ram readCount -alg longReads -a2t /home/ngs_lab/DB/Megan/accessionmap202402-virus.map -t 32 -v

Loading additional classification … from:" message not shown,
but the good old “Loading ncbi.map: 2,396,736 / Loading ncbi.tre: 2,396,740” appears again

.MEGAN.def sections:
AdditionClassifications=/home/ngs_lab/DB/Megan/ncbi_20240214.tre
MappingFile=/home/ngs_lab/DB/Megan/ncbi_20240214.map
TaxonomyFile=/home/ngs_lab/DB/Megan/ncbi_20240214.tre
TaxonomySynonymsFileLocation=/home/ngs_lab/DB/Megan/accessionmap202402-virus.map
TreeDirectory=/home/ngs_lab/DB/Megan/

I emphasize that the alternate taxonomy function works without any issues via the Java GUI.

Thank you all for your help!

Anupam · June 28, 2024, 12:55pm

Hi @blaize,

It still works for me. Would it be possible for you to provide the .tre and .map files?

blaize · July 1, 2024, 3:13pm

Dear Anupam,

Of course, you can access the files via this link, thank you in advance!

https://drive.google.com/drive/folders/1iPn7ytvL0IZWaZiBroRrw1EWonxC7BMq?usp=sharing

Anupam · July 2, 2024, 1:29pm

Dear @blaize,

Thank you for providing the file. I used it with my MEGAN installation, and it was recognized correctly. In the GUI, I went to Edit → Preferences → Add Classification, and selected the .tre file, ensuring all files were in the same directory. After restarting MEGAN and running the daa-meganizer, the file was accessible.

If I want to use the MEGAN GUI for analysis after meganization, I just need to use the new option in the Taxonomy and Function bar.

There are also new options for meganization -a2ncbi_20240214... etc

/Applications/MEGAN/tools/daa-meganizer 
Loading additional classification NCBI_20240214 from: /Applications/MEGAN/files/ncbi_20240214.tre and /Applications/MEGAN/files/ncbi_20240214.map
Loading additional classification VFDB_WEBSITE from: /Applications/MEGAN/files/vfdb_website.tre and /Applications/MEGAN/files/vfdb_website.map
Loading additional classification VFDB from: /Applications/MEGAN/files/vfdb.tre and /Applications/MEGAN/files/vfdb.map
Loading additional classification BIOSURFDB from: /Applications/MEGAN//files/biosurfdb.tre and /Applications/MEGAN/files/biosurfdb.map
SYNOPSIS
	Meganizer [options]
DESCRIPTION
	Prepares ('meganizes') a DIAMOND .daa file for use with MEGAN
OPTIONS
 Files  
	-i, --in [string(s)]                 Input DAA file(s). Each is meganized separately. Mandatory option.
	-mdf, --metaDataFile [string(s)]     Files containing metadata to be included in files. 
 Mode  
	-lg, --longReads                     Parse and analyse as long reads. Default value: false.
 Parameters  
	-class, --classify                   Run classification algorithm. Default value: true.
	-ms, --minScore [number]             Min score. Default value: 50.0.
	-me, --maxExpected [number]          Max expected. Default value: 0.01.
	-mpi, --minPercentIdentity [number]   Min percent identity. Default value: 0.0.
	-top, --topPercent [number]          Top percent. Default value: 10.0.
	-supp, --minSupportPercent [number]   Min support as percent of assigned reads (0==off). Default value: 0.01.
	-sup, --minSupport [number]          Min support (0==off). Default value: 0.
	-mrc, --minPercentReadCover [number]   Min percent of read length to be covered by alignments. Default value: 0.0.
	-mrefc, --minPercentReferenceCover [number]   Min percent of reference length to be covered by alignments. Default value: 0.0.
	-mrl, --minReadLength [number]       Minimum read length. Default value: 0.
	-alg, --lcaAlgorithm [string]        Set the LCA algorithm to use for taxonomic assignment. Default value: naive Legal values: naive, weighted, longReads
	-lcp, --lcaCoveragePercent [number]   Set the percent for the LCA to cover. Default value: 100.0.
	-ram, --readAssignmentMode [string]   Set the read assignment mode. Default value: alignedBases in long read mode, readCount else
	-cf, --conFile [string]              File of contaminant taxa (one Id or name per line). 
 Classification support:
	-mdb, --mapDB [string]               MEGAN mapping db (file megan-map.db). 
	-on, --only [string(s)]              Use only named classifications (if not set: use all). 
 Deprecated classification support:
	-tn, --parseTaxonNames               Parse taxon names. Default value: true.
	-a2t, --acc2taxa [string]            Accession-to-Taxonomy mapping file. 
	-s2t, --syn2taxa [string]            Synonyms-to-Taxonomy mapping file. 
	-t4t, --tags4taxonomy [string]       Tags for taxonomy id parsing (must set to activate id parsing). 
	-a2bacmet, --acc2bacmet [string]     Accession-to-BACMET mapping file. 
	-s2bacmet, --syn2bacmet [string]     Synonyms-to-BACMET mapping file. 
	-t4bacmet, --tags4bacmet [string]    Tags for BACMET id parsing (must set to activate id parsing). 
	-a2biosurfdb, --acc2biosurfdb [string]   Accession-to-BIOSURFDB mapping file. 
	-s2biosurfdb, --syn2biosurfdb [string]   Synonyms-to-BIOSURFDB mapping file. 
	-t4biosurfdb, --tags4biosurfdb [string]   Tags for BIOSURFDB id parsing (must set to activate id parsing). 
	-a2card, --acc2card [string]         Accession-to-CARD mapping file. 
	-s2card, --syn2card [string]         Synonyms-to-CARD mapping file. 
	-t4card, --tags4card [string]        Tags for CARD id parsing (must set to activate id parsing). 
	-a2ec, --acc2ec [string]             Accession-to-EC mapping file. 
	-s2ec, --syn2ec [string]             Synonyms-to-EC mapping file. 
	-t4ec, --tags4ec [string]            Tags for EC id parsing (must set to activate id parsing). 
	-a2eggnog, --acc2eggnog [string]     Accession-to-EGGNOG mapping file. 
	-s2eggnog, --syn2eggnog [string]     Synonyms-to-EGGNOG mapping file. 
	-t4eggnog, --tags4eggnog [string]    Tags for EGGNOG id parsing (must set to activate id parsing). 
	-a2gtdb, --acc2gtdb [string]         Accession-to-GTDB mapping file. 
	-s2gtdb, --syn2gtdb [string]         Synonyms-to-GTDB mapping file. 
	-t4gtdb, --tags4gtdb [string]        Tags for GTDB id parsing (must set to activate id parsing). 
	-a2interpro2go, --acc2interpro2go [string]   Accession-to-INTERPRO2GO mapping file. 
	-s2interpro2go, --syn2interpro2go [string]   Synonyms-to-INTERPRO2GO mapping file. 
	-t4interpro2go, --tags4interpro2go [string]   Tags for INTERPRO2GO id parsing (must set to activate id parsing). 
	-a2kegg, --acc2kegg [string]         Accession-to-KEGG mapping file. 
	-s2kegg, --syn2kegg [string]         Synonyms-to-KEGG mapping file. 
	-t4kegg, --tags4kegg [string]        Tags for KEGG id parsing (must set to activate id parsing). 
	-a2ncbi_20240214, --acc2ncbi_20240214 [string]   Accession-to-NCBI_20240214 mapping file. 
	-s2ncbi_20240214, --syn2ncbi_20240214 [string]   Synonyms-to-NCBI_20240214 mapping file. 
	-t4ncbi_20240214, --tags4ncbi_20240214 [string]   Tags for NCBI_20240214 id parsing (must set to activate id parsing). 
	-a2seed, --acc2seed [string]         Accession-to-SEED mapping file. 
	-s2seed, --syn2seed [string]         Synonyms-to-SEED mapping file. 
	-t4seed, --tags4seed [string]        Tags for SEED id parsing (must set to activate id parsing). 
	-a2vfdb, --acc2vfdb [string]         Accession-to-VFDB mapping file. 
	-s2vfdb, --syn2vfdb [string]         Synonyms-to-VFDB mapping file. 
	-t4vfdb, --tags4vfdb [string]        Tags for VFDB id parsing (must set to activate id parsing). 
	-a2vfdb_website, --acc2vfdb_website [string]   Accession-to-VFDB_WEBSITE mapping file. 
	-s2vfdb_website, --syn2vfdb_website [string]   Synonyms-to-VFDB_WEBSITE mapping file. 
	-t4vfdb_website, --tags4vfdb_website [string]   Tags for VFDB_WEBSITE id parsing (must set to activate id parsing). 
	-fwa, --firstWordIsAccession         First word in reference header is accession number (set to 'true' for NCBI-nr downloaded Sep 2016 or later). Default value: true.
	-atags, --accessionTags [string(s)]   List of accession tags. Default value(s): 'gb|' 'ref|'.
 Other:
	-t, --threads [number]               Number of threads. Default value: 8.
	-tsm, --tempStoreInMemory            Temporary storage in memory for SQLITE. Default value: false.
	-tsd, --tempStoreDir [string]        Temporary storage directory for SQLITE (if not in-memory). 
	-v, --verbose                        Echo commandline options and be verbose. Default value: false.
	-h, --help                           Show program usage and quit.
AUTHOR(s)
	Daniel H. Huson.

Further my .Megan.def file looks like this:

AdditionalClassifications=/Applications/MEGAN/files/ncbi_20240214.tre

The only thing that comes to mind is a possible permissions issue (on your system).

Best regards,
Anupam

blaize · July 2, 2024, 3:20pm

Thanks Anupam,

unfortunately as before, the CLI version still reports on the old NCBI map/tre files while the accessionmap file is OK:

Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading file: /home/ngs_lab/DB/Megan/accessionmap202402-virus.map

Is it possible that CLI daa-meganizer actually using the new taxonomy files as listed in MEGAN.def and in the GUI message window, but reporting incorrectly on the number of rows of the ncbi.map file and the number of taxids in the ncbi.tre to the terminal? Also I checked permissions, everything is right.

Bests: Blaize

gregorykrice · July 9, 2024, 7:06pm

It looks like daa-meganizer is not using the path in .MEGAN.def for ncbi.map and ncbi.tre, but if ncbi.map and ncbi.tre are in your working directory – i.e. where you are running daa-meganizer – it will work. Try running daa-meganizer in the directory where those files are located, i.e. /home/ngs_lab/DB/Megan/.

Alternatively, create a symlink in your working directory to them and try running daa-meganizer

ln -sf /home/ngs_lab/DB/Megan/ncbi.map /path/to/working_dir/ncbi.map
ln -sf /home/ngs_lab/DB/Megan/ncbi.tre /path/to/working_dir/ncbi.tre

blaize · July 10, 2024, 11:53am

@gregorykrice
Thanks for the tip, fortunately the issue has since been solved.