I am trying to meganize the daa file after blastx against the swissprot database. I followed the suggestions in this post-Import peptide identification to MEGAN
So I created a tab-separated mapping file for swissprot ids and NCBI ids-
sp|Q6GZX4|001R_FRG3G 654924
sp|Q6GZX3|002L_FRG3G 654924
sp|Q197F8|002R_IIV3 345201
sp|Q197F7|003L_IIV3 345201
Questions-
When I use this mapping file on the command line version of Megan, I get “java heap size” error, even after increasing the java flag Xmx to 500G.
The idmapping file is 6.4G in size.
When I try to meganize the daa file on GUI version, the daa-meganizer dialog box doesn’t take my mapping file.
When using daa-meganizer, youneed to supply this file as a “synonyms file” using the option
--syn2taxa
When attempting to open the file in MEGAN’s “Meganize DAA File” dialog, you need to use the “Load Synonyms Mapping File” button on the Taxonomy tab. You file suffix is .tab, which is not recognized by the file filter. Please change it to .txt or, if you are using a Mac, press shift when clicking on the button, this will provide an alternative file browser in which you can set the filter to accept all file suffixes.
I just noticed that MEGAN throws an exception when a synonyms file is loaded, due to a minor bug. The exception can be ignored. I will upload a new release later today in which this exception is no longer thrown.
Thank you for replying! I am doing exactly the same thing but using GTDB database directly for blast analysis (rather than blast against the nr database and then mapping to the GTDB mapping file).
Here is the script used-
##step1- diamond blastx --db ${DB_DIR}/gtdb_all.faa.dmnd --query ${OUT_DIR}/3contigs-230-12.fasta --outfmt 100 --out ${OUT_DIR}/gtdb-matches-3contigs-230-12-contigs.txt --threads 15 --long-reads
The step 3 rma2info was tested with --read2class GTDB and --read2class Taxonomy but both of them generate an empty output file. The rma file is not empty though. Why do thing its generating an empty file?
Please let me know if I could share the files with you to examine the results. I am using command line MEGAN version 6.21.1.
accession2taxid2columns.txt has two tab-separated columns, namely-
accession taxid
GCA003023405 3392
GCA001587575 1400
GCA002763345 2845