Trouble with maf2daa tool

kkrizanovic · September 9, 2022, 8:10am

Hello,

I’ve been testing out MEGAN-LR pipeline with the LAST aligner. I downloaded some metagenome datasets (PacBio Sequel II) from the NCBI (Project PRJNA754443).

I’ve mapped them to a protein database (NCBI NR) using LAST aligner, generating a MAF file, and converted them to a DAA file using maf2daa tool. The I’ve meganized the DAA file and produced a file with read-to-taxid information.

I’ve done it for three datasets (read file size is about 10Gb). The problem is that for one the pipeline is working as expected, while for other two, conversion to DAA file produces a very small file and i get read-to-taxid information only for the first read (only one line in the CSV file). Also, there is no error message, maf2daa seems to complete its run without a problem.

MAF files for all three test datasets are of similar size (about 1.5 TB), however, when i convert them to DAA, one DAA file is of expected size (10GB), while other two are very small (about 18-19 MB).

When I visually inspect the MAF files for “bad” datasets, they look OK, containing what seems to be good alignments.

The thing is that I use the same pipeline for all three files, for one it works and for other two it does not.

One more thing, when I use DIAMOND, all three datasets are mapped correctly and produce DAA files of similar size. However, the results seem worse than when I’m using LAST aligner.

Do you have any idea what I might be doing wrong? Or is it a bug in to maf2daa tool? I’d be happy to provide any information that you need, but files are just very large.

Daniel · September 14, 2022, 11:50am

Can you try converting the first 10,000 lines of a MAF file, say. If that doesn’t work as expected, then please send the file to me.

kkrizanovic · September 14, 2022, 12:09pm

I tried converting a smaller MAF file, and that worked correctly (first N lines, not sure how many). However, the file seems to be ok, doesn’t end abruptly or something like that.