I am trying to determine the fungal microbiome inside of chestnut trees by using Illumina sequencing. I am not very good at coding, so I am looking for an option to analyze my Illumina data that has limited (or no) coding required. The sequencing facility I am using is willing to help me by aligning my sequences and putting them into a file format that will work for me. I am hoping that Megan will help me determine the number of different species that are present within the trees I am studying. What file format should I request for my data from the sequencing facility? Thank you.
Ideally, ask them to use the alignment program Diamond to compare against the NCBI nr protein database, producing .daa files as output.
This is perhaps the fastest aligner and MEGAN can process such files the fastest (see the MEGAN Community Edition paper in PLoS Computational Biology for details).
If they are using some other alignment tool, then any BLAST-like output format would be good, especially text format or XML format (if you want to be able two e.g. assemble reads in MEGAN or inspect reads or alignments) or m8 tab format otherwise.
Thank you. How long should it take an xml file to load? I have been sent a file to test, but it seems to be taking an incredibly long time to import (hours). Is there a wrong and a write way to import the file?
Unfortunately, that can indeed take a long time, multiple days if the file is tens of GB in size…
Currently, the best way to get data into MEGAN is to use DIAMOND to compute DAA files and then to meganize the DAA files…
The file I am attempting to import now is an 18GB .xml file. How long should I expect it to to take to load before I give up and try something different?
MEGAN will get there eventually. Give the program as much memory as you can, as more memory=more speed for this type of thing (less time spent doing garbage collection).
Sorry that MEGAN isn’t very good at parsing large XML files. Perhaps use some converter to convert into text format or tab format and then try that? That should be faster.
I was able to load my file, but my computer lost power before I could save it (long story). I have tried to reload the file three times today and MEGAN keeps crashing/closing about 30 - 60min into the load. Any suggestions?
Any messages in the message window.? Did the program produce a file called error.log in the Megan installation directory?