Paired end read analysis diamond-MEGAN

Ally · November 5, 2019, 2:40pm

Hi,

I’m new to bioinformatics and metagenomic analysis and I have been asked to analyse paired end metagenomic sequences using diamond and MEGAN. Does anyone know the best way to go about this. I was thinking to align the reads to the NCBI database seperately using diamond, giving me two outputs per sample. Then I could select paired reads when importing from BLAST and select both diamond outputs for the sample. This should give me one MEGAN file I can then use to compare to the other samples after they have all been merged and meganised right? Do I need to meganise the samples all separately or is there a faster way via bash commands that allow the specification of paired reads?

Does anyone have any experience or advice? I have over 150GB of data to analyse and I’m worried I should be doing a number of steps before using diamond, I’m just not sure what they would be.

Thanks in advance.

Daniel · November 25, 2019, 9:39am

Unfortunately, due to the design of the DAA format, meganizer does not support paired-end analysis of DAA files.
The only way to do a paired-end analysis is to use the daa2rma tool which will create a new single .rma file from your two input .daa files. This program has command-line options to setup paired-read import. Use the options:

--paired --pairedSuffixLength <length>

Here <length> is the number of trailing letters in the first word of the read name that distinguishes
between two paired reads. If the two reads (in the two separate files) have exactly the same name (that is, first word in header line), then length should be set to 0, if they differ by one letter, then 1, etc.