Meganize diamond file for swissprot database

Hey All

I am trying to meganize the daa file after blastx against the swissprot database. I followed the suggestions in this post-Import peptide identification to MEGAN
So I created a tab-separated mapping file for swissprot ids and NCBI ids-
sp|Q6GZX4|001R_FRG3G 654924
sp|Q6GZX3|002L_FRG3G 654924
sp|Q197F8|002R_IIV3 345201
sp|Q197F7|003L_IIV3 345201

Questions-

  1. When I use this mapping file on the command line version of Megan, I get “java heap size” error, even after increasing the java flag Xmx to 500G.
    The idmapping file is 6.4G in size.

  2. When I try to meganize the daa file on GUI version, the daa-meganizer dialog box doesn’t take my mapping file.

Any suggestions?

Please send me the first 1000 lines of the mapping file and I will look into this

Hey Daniel

Thank you for replying. Here is the link to the first 1000 lines of the mapping file-https://github.com/Jigyasa3/errors/blob/master/1000lines_swissprot_ncbi.tab

When using daa-meganizer, youneed to supply this file as a “synonyms file” using the option

--syn2taxa

When attempting to open the file in MEGAN’s “Meganize DAA File” dialog, you need to use the “Load Synonyms Mapping File” button on the Taxonomy tab. You file suffix is .tab, which is not recognized by the file filter. Please change it to .txt or, if you are using a Mac, press shift when clicking on the button, this will provide an alternative file browser in which you can set the filter to accept all file suffixes.

I just noticed that MEGAN throws an exception when a synonyms file is loaded, due to a minor bug. The exception can be ignored. I will upload a new release later today in which this exception is no longer thrown.

Hey Daniel!

Thank you for replying! I am doing exactly the same thing but using GTDB database directly for blast analysis (rather than blast against the nr database and then mapping to the GTDB mapping file).

Here is the script used-
##step1-
diamond blastx --db ${DB_DIR}/gtdb_all.faa.dmnd --query ${OUT_DIR}/3contigs-230-12.fasta --outfmt 100 --out ${OUT_DIR}/gtdb-matches-3contigs-230-12-contigs.txt --threads 15 --long-reads

##step2-
/home/j/jigyasa-arora/local/megan/tools/blast2rma --in gtdb-matches-3contigs-230-12-contigs.txt.daa --format DAA --blastMode BlastX --out gtdb-matches-3contigs-230-12-contigs.txt.rma --longReads --maxExpected 1e-15 --minPercentIdentity 50 --lcaAlgorithm longReads --lcaCoveragePercent 60 --syn2gtdb ${DB_DIR2}/accession2taxid2columns.txt --threads 16 --verbose

##step3-
/home/j/jigyasa-arora/local/megan/tools/rma2info --in gtdb-matches-3contigs-230-12-contigs.txt.rma --read2class GTDB --paths --out meganoutput-gtdb-matches-3contigs-230-12-contigs.txt

The step 3 rma2info was tested with --read2class GTDB and --read2class Taxonomy but both of them generate an empty output file. The rma file is not empty though. Why do thing its generating an empty file?
Please let me know if I could share the files with you to examine the results. I am using command line MEGAN version 6.21.1.

accession2taxid2columns.txt has two tab-separated columns, namely-
accession taxid
GCA003023405 3392
GCA001587575 1400
GCA002763345 2845