A question about importing DIAMOND alignment result into MEGAN6

Hello everyone!

I am a new user to DIAMOND and MEGAN6. This question may sounds silly but I have tried so hard these days to try to import a DIAMOND alignment result file into MEGAN and visualize it.

So, I have generate different format type including .daa file, .sam file, .m0 file , .xml file and .m8 file.
I have tried to import them into MEGAN6 through the File>Import From BLAST and FILE>Meganize DAA File for the .daa files. How ever, only two nodes appear and all the reads are in the node “Not assigned”.

I have searched online for solution, from Biostar, Stackoverflow and here in the MEGAN community. However, I still cannot solve it…

I am aligning my fastq file to the NCBI nr database with the following code:

diamond blastx --query ./BP7-A/Sample.fastq --db nr --daa ./DAA/S1000_1.daa

I have followed the instruction in here Generic pipeline using DIAMOND and MEGAN6 but I still got only two nodes.
I have already download the GI-to-NCBI mapping file and also the accession mapping file.

The following are some of the reads inside the m8, sam and xml file.

.m8 file:

HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	CDA22218.1	84.8	33	5	0	3	101	440	472	1.8e-07	62.8
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	WP_068962080.1	80.6	31	6	0	3	95	444	474	9.7e-06	57.0
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	WP_046147429.1	63.6	33	12	0	3	101	440	472	2.4e-04	52.4
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	WP_068185816.1	63.6	33	12	0	3	101	440	472	3.1e-04	52.0 

.sam file:

@HD	    VN:1.5	    SO:query
@PG	    PN:DIAMOND
@mm	    BlastX
@CO	    BlastX-like alignments
@CO	    Reporting AS: bitScore, ZR: rawScore, ZE: expected, ZI: percent identity, ZL: reference length, ZF: frame, ZS: query start DNA coordinate
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	0	CDA22218.1	440	255	33M	*	0	0	EQIHALTRIDRWFLNKLHNIVQTADELESYNQL	*	AS:i:62	NM:i:5	ZL:i:1072	ZR:i:151	ZE:f:1.8e-07	ZI:i:84	ZF:i:3	ZS:i:3	MD:Z:D3D9E6R6K4
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	0	WP_068962080.1	444	255	31M	*	0	0	EQIHALTRIDRWFLNKLHNIVQTADELESYN	*	AS:i:56	NM:i:6	ZL:i:1079	ZR:i:136	ZE:f:9.7e-06	ZI:i:80	ZF:i:3	ZS:i:3	MD:Z:D3E2K6Y6A6G2
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	0	WP_046147429.1	440	255	33M	*	0	0	EQIHALTRIDRWFLNKLHNIVQTADELESYNQL	*	AS:i:52	NM:i:12	ZL:i:1074	ZR:i:124	ZE:f:2.4e-04	ZI:i:63	ZF:i:3	ZS:i:3	MD:Z:D6K2K3Q2Y2I3GA3FSKI

. xml file:

<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>HWI-C00135:230:CAG5BANXX:7:2309:15572:61333</Iteration_query-def>
  <Iteration_query-len>101</Iteration_query-len>
<Iteration_hits>
<Hit>
 <Hit_num>1</Hit_num>
  <Hit_id>gnl|BL_ORD_ID|460</Hit_id>
  <Hit_def>CDA22218.1</Hit_def> 
  <Hit_accession>460</Hit_accession>
  <Hit_len>1072</Hit_len>
  <Hit_hsps>

I am sorry for such a long passage, but I really need some help on it. Thank you very much!

The key issue is how you run import into MEGAN or meganization of a daa file, in particular how you specify the mapping files, and whether you are using the correct mapping files. Please show the details of this.

Thanks for the reply Daniel!

So, I import my .sam and other files as the following:

and I specify the mapping files, where I have tried for both nucleotides and protein mapping files, as the following where the .bin file is downloaded from the MEGAN download webpage:

Where when I am using normal BLAST, I could successfully run MEGAN and get the taxonomic tree well presented.

Thank you very much.

You are using the wrong mapping file: nucl_acc2tax-Nov2016.abin is for use with a nucleotide reference database, but your comparison was against a protein reference database, so you need to use the file called prot_acc2tax-Nov2016.abin.

Sorry for the confusing naming convention (it mirrors the way that NCBI names their files), I really should add some code to MEGAN that catches this quite common mistake…

No no! Thank you very much! I finally got the taxonomic tree!

I should have double checked the mapping file to see if I am using a wrong one…

Thanks a lot Daniel!!

Hi Daniel,
I am having problem to unzip prot_acc2tax-Nov2016.abin.zip in mac. But all other files worked perfectly. By any chance is the uploaded version corrupt? Last version working fine.
For this particular file, if I double click in terminal it goes into .cpgz. And if I use gunzip, it says “gunzip: /Users/medsmit/Documents/MEGAN_mapping_files/prot_acc2tax-Nov2016.abin.zip: unknown suffix – ignored”

Thanks,
S

Not sure what the problem is. I downloaded the file and then unzipped in a terminal window:

huson@haifisch:~$ unzip prot_acc2tax-Nov2016.abin.zip 
Archive:  prot_acc2tax-Nov2016.abin.zip
inflating: prot_acc2tax-Nov2016.abin
huson@haifisch:~$

Dear Daniel,
Thanks. Can this be any problem with different country mirror? I have no idea. I again tried to download and unzip. But unfortunately same error.

LIMM-Suparna-Mitra-MBPro13-LIBACS:Desktop medsmit$ unzip prot_acc2tax-Nov2016.abin.zip
Archive: prot_acc2tax-Nov2016.abin.zip
warning [prot_acc2tax-Nov2016.abin.zip]: 76 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [prot_acc2tax-Nov2016.abin.zip]: reported length of central directory is
-76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1
zipfile?). Compensating…
skipping: prot_acc2tax-Nov2016.abin need PK compat. v4.5 (can do v2.1)

note: didn’t find end-of-central-dir signature at end of central dir.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
LIMM-Suparna-Mitra-MBPro13-LIBACS:Desktop medsmit

All other files working fine though.

And this is the first time I am having this problem.
Thanks,
Suparna

Hi @Daniel I am having a similar issue as OP:

I did a diamond blastx against nr with a filtered fastq that was sequenced on Illumina NextSeq:

diamond blastx -d nr -q VLP_005-36.fastq -o VLP_005-36_nr.daa -p 48 --outfmt 100

The tabular output of this .daa file looks like:

NB551033:17:H27G7BGX2:1:11101:3237:1089	WP_055171390.1	58.0	50	21	0	151	2	269	318	1.2e-07	63.9
NB551033:17:H27G7BGX2:1:11101:3237:1089	WP_055256333.1	58.0	50	21	0	151	2	250	299	1.2e-07	63.9
NB551033:17:H27G7BGX2:1:11101:3237:1089	SCH20855.1	58.0	50	21	0	151	2	511	560	1.6e-07	63.5
NB551033:17:H27G7BGX2:1:11101:25987:1065	WP_055256332.1	58.0	50	20	1	3	149	34	83	6.5e-09	68.2
NB551033:17:H27G7BGX2:1:11101:25987:1065	WP_055171393.1	58.0	50	20	1	3	149	29	78	6.5e-09	68.2
NB551033:17:H27G7BGX2:1:11101:25987:1065	SCH20807.1	56.0	50	21	1	3	149	34	83	3.2e-08	65.9
NB551033:17:H27G7BGX2:1:11101:4896:1058	SCH20754.1	70.0	40	12	0	3	122	314	353	2.3e-06	59.7
NB551033:17:H27G7BGX2:1:11101:10573:1121	SCH20855.1	83.7	49	8	0	3	149	16	64	4.9e-17	95.1
NB551033:17:H27G7BGX2:1:11101:10573:1121	CDL65712.1	63.3	49	18	0	3	149	15	63	2.4e-11	76.3
NB551033:17:H27G7BGX2:1:11101:21716:1122	WP_055171390.1	78.0	50	11	0	1	150	349	398	1.9e-16	93.2

Following this, I MEGANized the file in the GUI MEGAN:

Used prot_acc2tax-Nov2016.abin for accession mapping file. Parse Taxon Names checked.

LCA parameters:

And still I am only getting two nodes with 3 mil+ unassigned reads. Any idea what may be going wrong here? Any help would be appreciated as I have been trying to solve this for several weeks!

Best,
Casey Jones

Everything looks correct. Are you sure that you are providing the mapping file in the correct manor?

Hi Daniel,

I believe so. See screenshot:

Are there any options I’m missing? Is there something wrong with my daa output? I don’t get assigned taxonomy when I follow the daa2rma steps on command line either, however there are hits in my diamond output so I’m not sure where the issue lies.

Thanks,
Casey

Hi Casey,

It is not clear to me what is going on. Could you please give me access to a small file that exhibits the problem and then I will be able to figure this out immediately…

Sent you a file privately @Daniel!

I was able to analyse your file without any problems on my computer. My guess is that there was a problem with the mapping file prot_acc2tax-Nov2016.abin as presented on our download website.
I re-uploaded the file, then downloaded it and tested it. Please check whether re-downloading the mapping file fixes the problem.

1 Like

I will try this out, thank you @Daniel. Would there be any way that you could send me your resulting .rma output from my .daa file?

Thanks,
Casey

Hi @Daniel - the mapping file was the issue. Thanks for trying out my file and solving the problem.

All the best!
Casey