Generic pipeline using DIAMOND and MEGAN6

Hi Daniel,

I’ve tried the above guidelines and when I import my .rma files into MEGAN, none of the reads could be assigned to anything. There is basically always a long line between the root and the bubble that says “not assigned” with nothing inbetween. Do you have any suggestions what might be causing this? Am I perhaps using the wrong GI-to-NCBI mapping files? I’ve downloaded them from there: http://ab.inf.uni-tuebingen.de/data/software/megan6/download/welcome.html .

Thanks!

GI numbers are no longer supported by NCBI. Only use a GI mapping file if you are using a very old NCBI database. For recent data, use the accession mapping file.

I want to install Diamond into a Windows 10 computer.

I visited

but failed to find how to install diamond in to a windows computer.

Where can I find the .exe or an installer programme of Diamond?

As far as I know, DIAMOND doesn’t run on windows

Hi,

I am running MEGAN6. My query file has 120,000 amino acid sequences and I am doing diamond blastp. I will run daa2rma on the blastp output, and include KEGG, SEED, COG and others.

Lengths of my amino acid sequences are in the range of 12-2,000. My question is: should I change any default parameters?

Thanks!

Can I get a bit more description?
Thanks a ton!

Is it possible to run KEGG and SEED without the license key in MEGAN CE? I tried to run KEGG, was unsuccessful due to the required license key…

unfortunately, KEGG requires a license, so while you can run KEGG using MEGAN CE, it uses KEGG version dating back to 2011…
The current KEGG mapping file only works with MEGAN UE

Thank you very much!

Hi Daniel,

Does this generic diamond megan workflow requires UE edition? I have generated the daa files using diamond, but I cannot work it out in the community edition of megan 6.

Cheers

No, the generic DIAMOND+MEGAN pipeline uses the Community Edition

@Daniel

Hi Daniel I have the Ultimate edition and just ran a blastp search in diamond on 24 files (divided into 4 samples). I am a little confused on what I am supposed to do now. I have been reading over the discussion here and see that I need to “meganize” my .daa files (well the extension is a .out file but I suppose I can change them to .daa since they are just text files). I see and option in the GUI for meganizing but I am unfamiliar with some of the other options. Should I be following the same workflow as you stated above in the original posts or do I need to do something different?

You need to run DIAMOND so that it produces a .daa file, using format 100
To meganize the files you can either use the GUI and Meganize command, or use the the command line program tools/daa-meganizer
The key parameters are:

  • long reads? Are you processing long reads or contigs, then use the long read mode (but you should then have also used the long read mode of DIAMOND)
  • Mapping files. MEGAN uses mapping files that map NCBI accessions to taxa and functions. You need to download these from the MEGAN6 download webpage and then set them as options.
1 Like

Can I ask what parameters were used in this pipeline? I am interested in using MEGAN to get the taxonomic classification of reads at a genus level. I found papers citing parameters of MEGAN (eg. Min support percent: 80, Min support: 15, Min complexity filter: 0.3, LCA algorithm: weighted). I couldn’t find these parameters in your code.

They are on the LCA Parameters tab:

What is the next step after getting the rma file, if I want to get the taxonomy information using the MEGAN6_Community version?

Open the file in MEGAN… explore interactively… Use the File->Export menu items to export to different formats…

What if I am using MEGAN_Community in linux?There is no " File->Export menu ". I have not found the method for MEGAN_Community_linux_vertion in command-line? I am confused about it. I have installed the MEGAN_Community in linux, but I have no idea how to use it in command-line to get the taxonomy information. The methods in “MANUAL” are for the interface version or the Ultimate(command) vertion, but no information for MEGAN_Community_linux_version(I think it have to work in command-line, it’s not?). Is the MEGAN6_Community_linux an interface vertion or a command vertion? I am really confused, need help.

Did you do the taxonomy classification based on only one paired data? I am facing the same question with you now.

Many thanks for this describtions.

Can you please provide link to download gi_taxid.bin and link to download gi2kegg.bin ?

Also, how I can get 10daa/reads.daa ?

I am working with long reads, do I need to add specific parameter for that ? Like – LongReads ? ?

Maybe if it is possible to demonstrate the full pipeline on youtube channel for megan. Otherwise, I am vrry thankfull if you please answer the above two questions.

Thank you for your support