Best raw data for DIAMOND and MEGAN

Hello, I’m a newbie. I’ve been tried to use MEGAN for my metagenome study, but it quit to hard.

I got 48 sequence data file from environmental samples (feces and soil) for metagenome study using Illumina NextSeq platform(2X150bp, paired-end). After quality filtering and adapter trimming, i got 3 sequence file (Forward, reverse, and single).

Then, i tried several methods before import to MEGAN.

First, 3 sequence files were assembled using metaSPADES. Assembled scaffolds.fasta files were analyzed using DIAMOND for taxonomic assignment. Then from .daa format file, i ran MEGAN using daa2rna.

Second, only forward and reverse sequence files were merged using overlapping merge software FLASH. Then analyzed using DIAMOND for taxonomic assignment. Then from .daa format file, i ran MEGAN using daa2rna.

Actually, i want to see the number of mapped reads like RPKM. But, I’m not sure which is more appropriate and i don’t know how to adapt from MEGAN result.

Dose anyone help which one is best sequence filtering format for MEGAN and how to analyze mapped reads percentage? Thank you.

  1. If you want MEGAN to be aware of the number of reads present in a contig, then put a “magnitude” statement on the header line of the contig, e.g.

>contig66|magnitude|900
accgcttcgacacact…

would setup a contig with magnitude 900.

  1. For long reads, please use DIAMONDs “long-read” mode
  2. For assembled contigs, please use the daa-meganizer (not daa2rma) and set it to long-read mode.

2+3 are discussed in our recent paper
here

1 Like

Thank you so much for your advice. I will try.