Problem importing BlastXML files to MEGAN (6.5.10)

Alexsal86 · November 29, 2016, 8:03pm

Hello everyone!

I am trying to import BlastXML output into MEGAN (6.5.10) that were obtained running BLASTp with DIAMOND, but it seems to fail. The error call indicates that MEGAN cannot recognize the input file format (XML) despite being listed among the usable formats for this software. It does, however, recognize it when the same file was saved as DAA format. What puzzles me the most is that sometimes it worked when importing BlastXML from other similar files into MEGAN6. It seems like this issue happened at random…
My input are FASTA files from metagenomics sequencing with Illumina MiSeq (samples for blastp were previously demultiplexed, assembled, and quality-checked).

Here is an example of my typical XML file:

<?xml version="1.0"?> blastx diamond 0.8.24 Benjamin Buchfink, Xie Chao, and Daniel Huson (2015), "Fast and sensitive protein alignment using DIAMOND", Nature Methods 12:59-60. Query_1 J00146:7:H7YGKBBXX:6:2208:10439:25386:GATCAG 284 blosum62 0.001 11 1

Please, any advice on this topic would be greatly appreciated!

Thanks you

Daniel · November 30, 2016, 9:32am

Please use DIAMOND’s .daa output to import into MEGAN.
The fastest and most space-efficient process is to run the command line program tools/daa-meganizer or to use
MEGAN’s File->Meganize DAA File… menu item
Nevertheless, I will look into the problem that you described (but using XML is by far the slowest and verbose option).

Alexsal86 · November 30, 2016, 12:06pm

Thank you!

I will then try to do as you suggested. However, the main problem is that my original dataset are very large (several million of sequences each) so my idea was to split first the original file into smaller chunks to process them in parallel with DIAMOND in a server, merge the ouput into a single file, and then import it into MEGAN to obtain a single .rma6 file. As far as I understood, there is no possibility to merge several .daa files into a single, larger file, am I right? Should I perhaps import into MEGAN all those .daa files altogether to create a single .rma6 file, instead of previously merging them?

Best,
Alejandro

Alexsal86 · November 30, 2016, 2:07pm

Sorry for replying you again.

Just a quick question that I forgot to include in my previous post: you suggested to use .daa files to be imported into MEGAN instead of blast-output. However, as far as I saw .daa files cannot be uploaded together with the original fasta file including the raw sequences specific for that .daa file, which are necessary in case one wants to retrieve reads from a specific functional group for further analyses, as I am interested in doing.

Is there a way to import a .daa file together with its corresponding fasta dataset?

Thank you again!