To work with a large number of large metagenomic shotgun samples, proceed as follows.
Let us assume that you have just received a hard-disk containing a collection of fastq.gz files, each representing one sample from your study.
Put these files into a directly called 00fastq.
Create a second directory called 10daa. This will contain DAA files generated by DIAMOND.
Create a third directory called 20rma. This will contain MEGAN RMA files.
For each file reads.fastq.gz in your 00fastq directory, run DIAMOND as follows:
diamond blastx --query 00fastq/reads.fastq.gz --db nr --daa 10daa/reads.daa
When this has completed, for each file reads.aa in your 10daa directory, run daa2rma as follows:
Hi Daniel,
When converting .daa files in MEGAN using the command as you wrote (daa2rma -i 10daa/reads.daa -o 20rma/reads.rma -g2t -g2t gi_taxid.bin -g2kegg gi2kegg.bin -fun KEGG), how to specify paired end information?
I have parsed my paired sequences separately using diamond blastx, and got output as reads_1.daa and reads_2.daa
thanks,
Suparna
the both daa-meganizer and daa2rma currently do not support explicit handling of paired reads. This is only available when using the MEGAN interactive import dialog.
I will update the daa2rma program so that it can explicitly handle paired reads in the near future.
I have the same question as Suparna, I have Illumina paired read data, so after I run Blastx using Diamond, I cannot use this data unless I use the GUI? And how exactly do I do that? Also, a separate question when I tried to load the Blastx data in XML format to the GUI, the application stopped responding and I couldn’t load that data.
Inside the MEGAN installation directory you will find a tools directory that provides a number of command line programs for processing alignment files such as produced by DIAMOND, BLAST, Last etc.
Is there any important difference between daa-meganizer and daa2rma? After running both on the same daa file, I couldn’t notice any significant changes except for slight differences in reads assigned/summarized and the rma file being substantially smaller.
meganizing a DAA file takes much less time that computing an RMA file, and producing an RMA file allows use instruct MEGAN to explicitly use mate pair information (although the current algorithm appears to be quite conservative, so I’m not sure how useful that is at present.) Both types of files, meganized-DAA and RMA, should be equivalent (except that the RMA file might contain less matches if one sets the limit of alignments to keep for a given read to something that is smaller to what the limit was set to for DIAMOND)
a) just looking at the help output of daa2rma, support for paired reads appears to be implemented… I don’t remember, did you report a problem with the feature?
b) MEGAN uses some multithreading, but jobs like computing an RMA file from a DAA file is limited mostly by IO, so multithreading is of limited use.
Thanks for your msg…
Sorry if I understood correctly, is this mean that now there is a option for daa2rma using paired reads in comandline too as can be done in GUI? Then how?
Thanks a lot,
Suparna
Here are the first few lines of the help displayed by the program:
SYNOPSIS
DAA2RMA6 [options]
DESCRIPTION
Computes a MEGAN .rma6 file from a DIAMOND .daa file
OPTIONS
Input
-i, --in [string(s)] Input DAA file. Mandatory option.
-mdf, --metaDataFile [string(s)] Files containing metadata to be included in RMA6 files.
Output
-o, --out [string(s)] Output file(s), one for each input file, or a directory. Mandatory option.
-c, --useCompression Compress reads and matches in RMA file (smaller files, longer to generate. Default value: true.
Reads
-mag, --magnitudes Reads are annotated with magnitudes. Default value: false.
-p, --paired Reads are paired. Default value: false.
-ps, --pairedSuffixLength [number] Length of name suffix used to distinguish between name (i.e. first word in header) of read and its mate (use 0 if read and mate have same name). Default value: 0.
-pof, --pairedReadsInOneFile Are paired reads in one file (usually they are in two). Default value: false.
Use -p to tell the program that reads are paired, use -ps to provide the length of the suffix which is used to distinguish paired reads (e.g. use 1 if pairs are named XXX.1 and XXX.2).
Use -pof to tell the program that paired reads appear together in one file (otherwise you must provide two files).
The paired read mode is overly conservative at the moment, pairs of reads are assigned to the LCA of the LCA of each. I will implement a second strategy (based on adding bitscores) in the near future.
I am currently using the trial version of MEGAN6_UE and I cannot get your script to work on my dataset. It complains about the key…
[grendon@compute-5-19 20170417_MEGAN6]$ cat megan6_script.sh
open file='/.../results/20170417_MEGAN6/533AC_R1.paired.rma';
uncollapse nodes=all;
select nodes=leaves;
export what=CSV format=taxonName_to_count separator=comma counts=summarized file='/.../results/20170417_MEGAN6/533AC_R1.paired.csv';
quit;
[grendon@compute-5-19 20170417_MEGAN6]$ MEGAN -g -E < megan6_script.sh
Version MEGAN Ultimate Edition (version 6.7.18, built 19 Apr 2017)
Copyright (C) 2016 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRAN
TY.
3D PCoA viewer not supported on this platform
Error: Invalid license key
It looks like the license key has not been entered? You can supply it on the command line using the -rc option.
You only need to do that once, MEGAN will “remember” the code.
My guess is that your file nr.gz is incomplete. Please try gunzip first so as to get nr.fa and then run diamond with nr.fa
If that doesn’t help, please contact Benjamin Buchfink via the DIAMOND GitHub page.