MEGAN-LR + LAST protocol requiring a large amount of disk space

kkrizanovic · June 5, 2020, 10:57am

Hi,

I’ve been using MEGAN with LAST according to your MEGAN-LR paper. I’ve replaced NR database with another set of sequences (a subset of NT database). I appreciate the help that I’ve already gotten on this forum.

In this protocol, reads are mapped to a reference database using LAST. MAF file produced by LAST is quite large - hundreds of GB in my case. I’ve tried piping LAST output to maf2daa tool. However this also uses a lot of temporary disk space (if somewhat less then MAF file itself). For a last run it took almost 1TB, when my server ran out of disk :). I’ve been running LAST on 16 threads, not sure f that matters.

Is there a way to make the whole process use less disk?

Sincerely,
University of Zagreb,
Faculty of Electrical Engineering and Computing
Krešimir Križanović

Daniel · June 5, 2020, 3:24pm

If you are using last to align against proteins, then please consider using DIAMOND instead. DIAMOND how has a special mode for long reads (published in https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0665-y).
Unfortunately, our MAF2DAA converter does make heavy use of temporary files…

kkrizanovic · June 6, 2020, 1:07pm

The trouble is not when I’m mapping to NR database, that’s when temporary files are manageable. The problem arises when mapping to a subset of NT database.

I could try mapping with other mapping tools that produce SAM format (e.g. Minimap2). I can convert that to rma I think. I’ll have to check that option out.

Thank you for your response. I’ll see if I can get a bigger disk or use another mapper.

Daniel · June 8, 2020, 5:45am

To be clear, are you talking about mapping DNA-to-DNA with LAST? I believe that the maf2daa converter only works for DNA-to-protein alignments.

We have never looked into processing DNA-to-DNA alignments obtained with LAST. We do have experience with processing minimap2 alignments and should be able to help with issues there.