A question about importing DIAMOND alignment result into MEGAN6

HMarcus · March 21, 2017, 9:22pm

Hello everyone!

I am a new user to DIAMOND and MEGAN6. This question may sounds silly but I have tried so hard these days to try to import a DIAMOND alignment result file into MEGAN and visualize it.

So, I have generate different format type including .daa file, .sam file, .m0 file , .xml file and .m8 file.
I have tried to import them into MEGAN6 through the File>Import From BLAST and FILE>Meganize DAA File for the .daa files. How ever, only two nodes appear and all the reads are in the node “Not assigned”.

I have searched online for solution, from Biostar, Stackoverflow and here in the MEGAN community. However, I still cannot solve it…

I am aligning my fastq file to the NCBI nr database with the following code:

diamond blastx --query ./BP7-A/Sample.fastq --db nr --daa ./DAA/S1000_1.daa

I have followed the instruction in here Generic pipeline using DIAMOND and MEGAN6 but I still got only two nodes.
I have already download the GI-to-NCBI mapping file and also the accession mapping file.

The following are some of the reads inside the m8, sam and xml file.

.m8 file:

HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	CDA22218.1	84.8	33	5	0	3	101	440	472	1.8e-07	62.8
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	WP_068962080.1	80.6	31	6	0	3	95	444	474	9.7e-06	57.0
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	WP_046147429.1	63.6	33	12	0	3	101	440	472	2.4e-04	52.4
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	WP_068185816.1	63.6	33	12	0	3	101	440	472	3.1e-04	52.0

.sam file:

@HD	    VN:1.5	    SO:query
@PG	    PN:DIAMOND
@mm	    BlastX
@CO	    BlastX-like alignments
@CO	    Reporting AS: bitScore, ZR: rawScore, ZE: expected, ZI: percent identity, ZL: reference length, ZF: frame, ZS: query start DNA coordinate
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	0	CDA22218.1	440	255	33M	*	0	0	EQIHALTRIDRWFLNKLHNIVQTADELESYNQL	*	AS:i:62	NM:i:5	ZL:i:1072	ZR:i:151	ZE:f:1.8e-07	ZI:i:84	ZF:i:3	ZS:i:3	MD:Z:D3D9E6R6K4
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	0	WP_068962080.1	444	255	31M	*	0	0	EQIHALTRIDRWFLNKLHNIVQTADELESYN	*	AS:i:56	NM:i:6	ZL:i:1079	ZR:i:136	ZE:f:9.7e-06	ZI:i:80	ZF:i:3	ZS:i:3	MD:Z:D3E2K6Y6A6G2
HWI-C00135:230:CAG5BANXX:7:2309:15572:61333	0	WP_046147429.1	440	255	33M	*	0	0	EQIHALTRIDRWFLNKLHNIVQTADELESYNQL	*	AS:i:52	NM:i:12	ZL:i:1074	ZR:i:124	ZE:f:2.4e-04	ZI:i:63	ZF:i:3	ZS:i:3	MD:Z:D6K2K3Q2Y2I3GA3FSKI

. xml file:

<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>HWI-C00135:230:CAG5BANXX:7:2309:15572:61333</Iteration_query-def>
  <Iteration_query-len>101</Iteration_query-len>
<Iteration_hits>
<Hit>
 <Hit_num>1</Hit_num>
  <Hit_id>gnl|BL_ORD_ID|460</Hit_id>
  <Hit_def>CDA22218.1</Hit_def> 
  <Hit_accession>460</Hit_accession>
  <Hit_len>1072</Hit_len>
  <Hit_hsps>

I am sorry for such a long passage, but I really need some help on it. Thank you very much!

Daniel · March 23, 2017, 12:15pm

The key issue is how you run import into MEGAN or meganization of a daa file, in particular how you specify the mapping files, and whether you are using the correct mapping files. Please show the details of this.

HMarcus · March 23, 2017, 4:17pm

Thanks for the reply Daniel!

So, I import my .sam and other files as the following:

and I specify the mapping files, where I have tried for both nucleotides and protein mapping files, as the following where the .bin file is downloaded from the MEGAN download webpage:

Where when I am using normal BLAST, I could successfully run MEGAN and get the taxonomic tree well presented.

Thank you very much.

Daniel · March 24, 2017, 8:34am

You are using the wrong mapping file: nucl_acc2tax-Nov2016.abin is for use with a nucleotide reference database, but your comparison was against a protein reference database, so you need to use the file called prot_acc2tax-Nov2016.abin.

Sorry for the confusing naming convention (it mirrors the way that NCBI names their files), I really should add some code to MEGAN that catches this quite common mistake…

HMarcus · March 24, 2017, 12:58pm

No no! Thank you very much! I finally got the taxonomic tree!

I should have double checked the mapping file to see if I am using a wrong one…

Thanks a lot Daniel!!

smitra · April 3, 2017, 5:58pm

Hi Daniel,
I am having problem to unzip prot_acc2tax-Nov2016.abin.zip in mac. But all other files worked perfectly. By any chance is the uploaded version corrupt? Last version working fine.
For this particular file, if I double click in terminal it goes into .cpgz. And if I use gunzip, it says “gunzip: /Users/medsmit/Documents/MEGAN_mapping_files/prot_acc2tax-Nov2016.abin.zip: unknown suffix – ignored”

Thanks,
S

Daniel · April 6, 2017, 10:37am

Not sure what the problem is. I downloaded the file and then unzipped in a terminal window:

huson@haifisch:~$ unzip prot_acc2tax-Nov2016.abin.zip 
Archive:  prot_acc2tax-Nov2016.abin.zip
inflating: prot_acc2tax-Nov2016.abin
huson@haifisch:~$

smitra · April 7, 2017, 6:25pm

Dear Daniel,
Thanks. Can this be any problem with different country mirror? I have no idea. I again tried to download and unzip. But unfortunately same error.

LIMM-Suparna-Mitra-MBPro13-LIBACS:Desktop medsmit$ unzip prot_acc2tax-Nov2016.abin.zip
Archive: prot_acc2tax-Nov2016.abin.zip
warning [prot_acc2tax-Nov2016.abin.zip]: 76 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [prot_acc2tax-Nov2016.abin.zip]: reported length of central directory is
-76 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1
zipfile?). Compensating…
skipping: prot_acc2tax-Nov2016.abin need PK compat. v4.5 (can do v2.1)

note: didn’t find end-of-central-dir signature at end of central dir.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
LIMM-Suparna-Mitra-MBPro13-LIBACS:Desktop medsmit

All other files working fine though.

And this is the first time I am having this problem.
Thanks,
Suparna

Casey · April 12, 2017, 5:08pm

Hi @Daniel I am having a similar issue as OP:

I did a diamond blastx against nr with a filtered fastq that was sequenced on Illumina NextSeq:

diamond blastx -d nr -q VLP_005-36.fastq -o VLP_005-36_nr.daa -p 48 --outfmt 100

The tabular output of this .daa file looks like:

NB551033:17:H27G7BGX2:1:11101:3237:1089	WP_055171390.1	58.0	50	21	0	151	2	269	318	1.2e-07	63.9
NB551033:17:H27G7BGX2:1:11101:3237:1089	WP_055256333.1	58.0	50	21	0	151	2	250	299	1.2e-07	63.9
NB551033:17:H27G7BGX2:1:11101:3237:1089	SCH20855.1	58.0	50	21	0	151	2	511	560	1.6e-07	63.5
NB551033:17:H27G7BGX2:1:11101:25987:1065	WP_055256332.1	58.0	50	20	1	3	149	34	83	6.5e-09	68.2
NB551033:17:H27G7BGX2:1:11101:25987:1065	WP_055171393.1	58.0	50	20	1	3	149	29	78	6.5e-09	68.2
NB551033:17:H27G7BGX2:1:11101:25987:1065	SCH20807.1	56.0	50	21	1	3	149	34	83	3.2e-08	65.9
NB551033:17:H27G7BGX2:1:11101:4896:1058	SCH20754.1	70.0	40	12	0	3	122	314	353	2.3e-06	59.7
NB551033:17:H27G7BGX2:1:11101:10573:1121	SCH20855.1	83.7	49	8	0	3	149	16	64	4.9e-17	95.1
NB551033:17:H27G7BGX2:1:11101:10573:1121	CDL65712.1	63.3	49	18	0	3	149	15	63	2.4e-11	76.3
NB551033:17:H27G7BGX2:1:11101:21716:1122	WP_055171390.1	78.0	50	11	0	1	150	349	398	1.9e-16	93.2

Following this, I MEGANized the file in the GUI MEGAN:

Used prot_acc2tax-Nov2016.abin for accession mapping file. Parse Taxon Names checked.

LCA parameters:

And still I am only getting two nodes with 3 mil+ unassigned reads. Any idea what may be going wrong here? Any help would be appreciated as I have been trying to solve this for several weeks!

Best,
Casey Jones

Daniel · April 13, 2017, 10:29am

Everything looks correct. Are you sure that you are providing the mapping file in the correct manor?

Casey · April 13, 2017, 4:04pm

Hi Daniel,

I believe so. See screenshot:

Are there any options I’m missing? Is there something wrong with my daa output? I don’t get assigned taxonomy when I follow the daa2rma steps on command line either, however there are hits in my diamond output so I’m not sure where the issue lies.

Thanks,
Casey

Daniel · April 14, 2017, 7:06am

Hi Casey,

It is not clear to me what is going on. Could you please give me access to a small file that exhibits the problem and then I will be able to figure this out immediately…

Casey · April 18, 2017, 4:06pm

Sent you a file privately @Daniel!

Daniel · April 19, 2017, 8:48am

I was able to analyse your file without any problems on my computer. My guess is that there was a problem with the mapping file prot_acc2tax-Nov2016.abin as presented on our download website.
I re-uploaded the file, then downloaded it and tested it. Please check whether re-downloading the mapping file fixes the problem.

Casey · April 19, 2017, 4:26pm

I will try this out, thank you @Daniel. Would there be any way that you could send me your resulting .rma output from my .daa file?

Thanks,
Casey

Casey · April 19, 2017, 8:59pm

Hi @Daniel - the mapping file was the issue. Thanks for trying out my file and solving the problem.

All the best!
Casey