Processing Reads from Multiple Lanes


I have been using the generic pipeline suggested in the Tutorials sections, for processing a recent metagenomics dataset that I received.

Samples were sequenced on two lanes on the HiSeq. Thus, I have 4 sets of reads per sample i.e. Reads 1 and 2 from Lane 1 and Reads 1 and 2 from lane 2

My workflow has been the following

  1. Adapter and Quality Trimming all four files per sample
  2. Concatenate all four files into a single file
  3. Run diamond on this single file
  4. Then use Megan6 for further processing

Can anyone please advise if Step 2 is appropriate in my case or is it better to process each lane separately?

You could first run all four files separately through DIAMOND and then import the four resulting DAA separately into MEGAN to see whether the four samples show highly similar taxonomic and functional profiles. (If they don’t then there is a problem with the sequencing.)
You can then reimport all four DAA files simultaneously into a single RMA file that then contains all reads.

However, importing DAA files into MEGAN takes quite a long time. The alternative is to concatenate all four reads files into a single DIAMOND input file, as you suggest, then run DIAMOND on that file, and then use daa-meganizer to meganize the resulting DAA, which will allow you to open the DAA file directly in MEGAN. This is the fastest option.

Hi Daniel

Thank you for the reply. I am proceeding with the latter. However, is there any way to provide the inputs to DIAMOND as PE reads instead of treating paired reads as SE reads?

When I ran the meganizer step, I could see that there is an option to specify if reads are paired

Further, I have been trying to run DIAMOND as optimally as possible. However, the GitHub page states that the --tmpdir variable is set to the output folder by default whereas the user manual states that --tmpdir is set to /dev/shm by default. Which of these is correct? I have 256G of RAM on my server and would like to know which option would be best to enhance the performance of DIAMOND

Thanks in advance

As you have 256G, use /dev/shm which is a RAM disk.

Meganizer should be able to take mate pair info into account, as long as both reads of a pair are in the same file and either have different name-suffixes or are not adjacent in the file.

Thanks again! I have been following your suggestions

I hit a roadblock while using the daa-meganizer option. During my previous try, when I had used default settings, meganizer failed to identify my reads as being paired and hence the verbose output had paired: false

To overcome this, I set the option as --paired true and --pairedSuffixLength 11. The following warning popped up

WARNING: Not an RMA6 file, will ignore paired read information

Is there a way to overcome this warning without converting my daa file to an RMA6 file?

sorry, my previous response was incorrect. Meganizing a DAA files currently does not take paired-end info into account

Thanks Daniel. Can we expect this update to meganizer soon or would you advise to take the MEGAN6 > Import DAA option where we can specify reads as being paired?

There will be no update in the foreseeable future, but it is on my list of things to do…

Hi Daniel,

Is there a specific way to concatenate the files, or should I append one after another?