In your paper you are using MEGAN-LR (with LAST aligner) by mapping to NR protein database. This is working fine for me. However, I would like to try it out with a nucleotide database that I have taken from kraken tool, so that I can compare MEGAN to it.
On my first try, I didn’t get any reads aligned, however, by reading through the forums I found out that the headers in my database are not formatted as they should be for MEGAN. For example, one of the headers is:
I’m not trying to import kraken output, just use the same sequence database for aligning reads so that I can compare results from Kraken and MEGAN. Kraken simply downloads sequences from RefSeq database. The files are in FASTA format and have headers in the format that I wrote in the last post. That header seems to contain taxid but it also has some other text before that. I will reformat the headers and see if MEGAN works, but do you see any other problem with running MEGAN-LR with a nucleotide database?
Yes, MEGAN is happy to process nucleotide alignments.
In your case, if you remove
kraken:
and leave
taxid|xxxxx
then MEGAN can perform taxonomic assignment if you select “Extended Mode” and “Id Parsing” on the Taxonomy tab of the Import Blast dialog.
However, the following simpler solution should also work:
When selecting Id Parsing, you can specify the tags used by MEGAN to identify Ids. For taxonomy, the default tags are tax| taxonomy|.
If you specify kraken:taxid| as a tag, then you will not have to edit the lines in the input files.
Before this script I mapped reads with LAST to produce maf file and converted that to daa using maf2daa script. Then I also ran daa-meganizer with the option -lg for long reads.
Can MEGAN output relative abundancies of organisms in the sample or just the read counts?
I really appreciate your help. To give you a little more information, we are currently testing several metagenomic classifiers on long read data and MEGAN is one of them.
This is how you turn on idParsing and set the tag prefix to kraken:taxid| :
set idParsing=true cName=Taxonomy prefix=‘kraken:taxid|’;
The run “import from blast” from the command-line in MEGAN UE, the easiest way to figure out the commands is to do this once from the GUI and then copy the command from the message window, as MEGAN echos all interactive commands there.
At present, MEGAN can output read counts, total bases or total aligned bases. If you want relative read counts for multiple samples, then you need to open those samples together in a single document using File->Compare and select relative counts when setting up the comparison.
Note that relative read counts is not the same as relative abundances, as the former doesn’t take genome sizes into account…
I’ve been quite busy and didn’t get to test this until yesterday. Unfortunately, I still didn’t manage to get MEGAN to classify anything. I have trouble testing this with the MEGAN GUI because all my data is on a GUI-less server.
I want to repeat all my steps to make clear what exactly I’m doing. I hope you can check this out and maybe see what I’m doing wrong.
STEP 1. Map reads to a reference database using Last aligner. This produces quite large .maf file. The reference database is a .fna file with headers like this:
‘>kraken:taxid|272844|NC_000868.1 Pyrococcus abyssi GE5, complete genome’
STEP 2. Convert .maf file to .daa file using maf2daa tool. This produces much smaller .daa file that can be loaded into MEGAN.
STEP 3. Prepare results for analysis with MEGAN using daa-meganizer tool. Set option for long reads (-lg).
This is where I might be doing something wrong. The output of daa-meganizer tool says that it only managed to classify into two classes (as I can understand it). Last part of daa-meganizer output:
As previously mentioned, you will have to use a different mechanism for identifying taxa, namely make use of the fact that your reference sequences already contain a taxon id in them:
As mentioned, MEGAN allows parsing of such ids to be turned on.
Unfortunately, daa-meganizer was missing the necessary option for turning on taxon-id parsing. I have fixed this. I will upload a new release of MEGAN 6.18.11 in which daa-meganizer has a new option:
-t4t, --tags4taxonomy [string] Tags for taxonomy id parsing (must set to activate id parsing).
Here you must specify:
-t4t taxid|
So, your daa-meganizer call should then look like this (using quotes to protect the pipe character ‘|’):