Hi,
I’ve been using MEGAN with LAST according to your MEGAN-LR paper. I’ve replaced NR database with another set of sequences (a subset of NT database). I appreciate the help that I’ve already gotten on this forum.
In this protocol, reads are mapped to a reference database using LAST. MAF file produced by LAST is quite large - hundreds of GB in my case. I’ve tried piping LAST output to maf2daa tool. However this also uses a lot of temporary disk space (if somewhat less then MAF file itself). For a last run it took almost 1TB, when my server ran out of disk :). I’ve been running LAST on 16 threads, not sure f that matters.
Is there a way to make the whole process use less disk?
Sincerely,
University of Zagreb,
Faculty of Electrical Engineering and Computing
Krešimir Križanović
If you are using last to align against proteins, then please consider using DIAMOND instead. DIAMOND how has a special mode for long reads (published in https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0665-y).
Unfortunately, our MAF2DAA converter does make heavy use of temporary files…
The trouble is not when I’m mapping to NR database, that’s when temporary files are manageable. The problem arises when mapping to a subset of NT database.
I could try mapping with other mapping tools that produce SAM format (e.g. Minimap2). I can convert that to rma I think. I’ll have to check that option out.
Thank you for your response. I’ll see if I can get a bigger disk or use another mapper.
To be clear, are you talking about mapping DNA-to-DNA with LAST? I believe that the maf2daa converter only works for DNA-to-protein alignments.
We have never looked into processing DNA-to-DNA alignments obtained with LAST. We do have experience with processing minimap2 alignments and should be able to help with issues there.