Using MEGAN-LR pipeline on PacBio hifi reads

kkrizanovic · June 18, 2021, 12:29pm

Hello,

I’ve been trying to test MEGAN-LR pipeline (with LAST aligner) to classify metagenomic sample sequenced with PacBio hifi technology.

However, when trying to convert a .maf file produced by last aligner to .daa format, using maf2daa tool, i get a really small .daa file which then results in zero classified reads.

For example, with a maf file of over 500GB, i get a daa file of 17MB.

I have tried this for all three samplef for zymo gut microbiome (https://www.ncbi.nlm.nih.gov/sra/?term=pacbio%20zymo%20d6331) and also for one ATCC synthetic sample (https://www.ncbi.nlm.nih.gov/sra/SRX8173258[accn]).

To be clear, I’ve downloaded the data linked above and tested MEGAN on a subsampled portion of it. The process of subsampling was tested, and it works correctly. Some other tools (namely Kraken, we used it because it was fast), classify a significant number of reads.

Also, i have not had a similar problem with ONT zymo mock community data downloaded from Loman Labs pages (https://lomanlab.github.io/mockcommunity/).

Do you have any advice for me or can you suggest what I might be doing wrong?

Best regards,
Krešimir Križanović

Daniel · June 22, 2021, 6:10am

Dear Krešimir Križanović,

rather than use maf2daa, could you try first sort the MAF file using sort-last-maf and then to import the sorted file using blast2rma.

If that doesn’t help, please give me access the first 10-100mb of your sorted file and I will take a look at it

Also, please note that DIAMOND now supports alignment of long reads, so please also consider using DIAMOND and then running daa-meganizer on the output file (in long read mode).

D