Generic pipeline using DIAMOND and MEGAN6

As far as I know, DIAMOND doesn’t run on windows

Hi,

I am running MEGAN6. My query file has 120,000 amino acid sequences and I am doing diamond blastp. I will run daa2rma on the blastp output, and include KEGG, SEED, COG and others.

Lengths of my amino acid sequences are in the range of 12-2,000. My question is: should I change any default parameters?

Thanks!

Can I get a bit more description?
Thanks a ton!

Is it possible to run KEGG and SEED without the license key in MEGAN CE? I tried to run KEGG, was unsuccessful due to the required license key…

unfortunately, KEGG requires a license, so while you can run KEGG using MEGAN CE, it uses KEGG version dating back to 2011…
The current KEGG mapping file only works with MEGAN UE

Thank you very much!

Hi Daniel,

Does this generic diamond megan workflow requires UE edition? I have generated the daa files using diamond, but I cannot work it out in the community edition of megan 6.

Cheers

No, the generic DIAMOND+MEGAN pipeline uses the Community Edition

@Daniel

Hi Daniel I have the Ultimate edition and just ran a blastp search in diamond on 24 files (divided into 4 samples). I am a little confused on what I am supposed to do now. I have been reading over the discussion here and see that I need to “meganize” my .daa files (well the extension is a .out file but I suppose I can change them to .daa since they are just text files). I see and option in the GUI for meganizing but I am unfamiliar with some of the other options. Should I be following the same workflow as you stated above in the original posts or do I need to do something different?

You need to run DIAMOND so that it produces a .daa file, using format 100
To meganize the files you can either use the GUI and Meganize command, or use the the command line program tools/daa-meganizer
The key parameters are:

  • long reads? Are you processing long reads or contigs, then use the long read mode (but you should then have also used the long read mode of DIAMOND)
  • Mapping files. MEGAN uses mapping files that map NCBI accessions to taxa and functions. You need to download these from the MEGAN6 download webpage and then set them as options.
1 Like

Can I ask what parameters were used in this pipeline? I am interested in using MEGAN to get the taxonomic classification of reads at a genus level. I found papers citing parameters of MEGAN (eg. Min support percent: 80, Min support: 15, Min complexity filter: 0.3, LCA algorithm: weighted). I couldn’t find these parameters in your code.

They are on the LCA Parameters tab:

What is the next step after getting the rma file, if I want to get the taxonomy information using the MEGAN6_Community version?

Open the file in MEGAN… explore interactively… Use the File->Export menu items to export to different formats…

What if I am using MEGAN_Community in linux?There is no " File->Export menu ". I have not found the method for MEGAN_Community_linux_vertion in command-line? I am confused about it. I have installed the MEGAN_Community in linux, but I have no idea how to use it in command-line to get the taxonomy information. The methods in “MANUAL” are for the interface version or the Ultimate(command) vertion, but no information for MEGAN_Community_linux_version(I think it have to work in command-line, it’s not?). Is the MEGAN6_Community_linux an interface vertion or a command vertion? I am really confused, need help.

Did you do the taxonomy classification based on only one paired data? I am facing the same question with you now.

Many thanks for this describtions.

Can you please provide link to download gi_taxid.bin and link to download gi2kegg.bin ?

Also, how I can get 10daa/reads.daa ?

I am working with long reads, do I need to add specific parameter for that ? Like – LongReads ? ?

Maybe if it is possible to demonstrate the full pipeline on youtube channel for megan. Otherwise, I am vrry thankfull if you please answer the above two questions.

Thank you for your support

Unfortunately, mapping using gi numbers are no longer supported.

Hello,
I have been following the generic pipeline for generateing .megan file. I have 2 samples and already generated .megan file for one of them. However, the next sample had been stuck on the step of .daa file generation for a while now.
The command used is :
~/Tools/diamond/bin/diamond blastx --query ../00fastq/sample.fq.gz --db nr --daa ./sample.daa

The sample file size is ~500 mb. The log file shows the following message:

Masking low complexity seeds… [0.305s]
Searching alignments… [75.809s]
Deallocating buffers… [0.564s]
Clearing query masking… [7.9s]
Opening temporary output file… [0s]
Computing alignments… [2472.54s]
Deallocating reference… [0.333s]
Loading reference sequences… [16.142s]
Masking reference…

I understand this means the process is going on but the slow progress concerns me as it has been more than 3 days were the other sample of same size took only a day. I don’t understand if it has stopped or not. There is .daa file created in the sample name but is empty. The nr database is the same as mentioned in the pipeline. Hope you would help me in this case.

Thank you.

Please address questions regarding the DIAMOND program to Benjamin Buckfink via the DIAMOND GitHub page.