Best raw data for DIAMOND and MEGAN

Jongbin · April 25, 2019, 5:31am

Hello, I’m a newbie. I’ve been tried to use MEGAN for my metagenome study, but it quit to hard.

I got 48 sequence data file from environmental samples (feces and soil) for metagenome study using Illumina NextSeq platform(2X150bp, paired-end). After quality filtering and adapter trimming, i got 3 sequence file (Forward, reverse, and single).

Then, i tried several methods before import to MEGAN.

First, 3 sequence files were assembled using metaSPADES. Assembled scaffolds.fasta files were analyzed using DIAMOND for taxonomic assignment. Then from .daa format file, i ran MEGAN using daa2rna.

Second, only forward and reverse sequence files were merged using overlapping merge software FLASH. Then analyzed using DIAMOND for taxonomic assignment. Then from .daa format file, i ran MEGAN using daa2rna.

Actually, i want to see the number of mapped reads like RPKM. But, I’m not sure which is more appropriate and i don’t know how to adapt from MEGAN result.

Dose anyone help which one is best sequence filtering format for MEGAN and how to analyze mapped reads percentage? Thank you.

Daniel · May 2, 2019, 9:49am

If you want MEGAN to be aware of the number of reads present in a contig, then put a “magnitude” statement on the header line of the contig, e.g.

>contig66|magnitude|900
accgcttcgacacact…

would setup a contig with magnitude 900.

For long reads, please use DIAMONDs “long-read” mode
For assembled contigs, please use the daa-meganizer (not daa2rma) and set it to long-read mode.

2+3 are discussed in our recent paper
here

Jongbin · May 7, 2019, 7:11am

Thank you so much for your advice. I will try.