How to run blast2lca in multi-thread, or Is there a way to make it faster?

kmat0087 · February 26, 2022, 5:34am

Hi,

I would like to use Megan on the command line, and I am trying blast2lca in the megan package on bioconda (MEGAN Community Edition version 6.21.7). It was fine when I passed a small blast tab file (about 10000 sequences, 200MB) as a trial. However, when I passed a larger file (about 10M sequences, 12GB), the blast2lca did not finish even after 10 days. I thought maybe I should have specified multi-thread, but I couldn’t find that option in the help message displayed by blast2lca -h.
Can we specify the number of threads for blast2lca? Or is there any way to make the process faster?

Daniel · March 7, 2022, 4:06pm

The blast2lca program is not parallelized, unfortunately…
Moreover, it is most likely not the program that you want to use. If you want to import data into MEGAN, then the best way to do this is to run DIAMOND, producing a DAA file (specify format 100) and then to meganize that file. Or, if you have a blast file or similar, use either the blast2rma tool or MEGAN, to import the file into RMA format.

kmat0087 · March 18, 2022, 3:54pm

Thank you for your response!

I’m sorry I didn’t explain enough about what I wanted to do. I wanted to get a list of taxonomic annotations for sequences (e.g., tab delimited file) without launching GUI app so that I could process them automatically with a shell script.
If there is a better method for such a case, I would appreciate it if you could let me know.

Daniel · April 14, 2022, 7:20am

Then you need to use daa2info or rma2info to extract such information from a meganized DAA file or an RMA file

kmat0087 · July 22, 2022, 4:21am

Sorry for my late reply.
Actually I haven’t tried those yet, but daa2info and rma2info seem to be exactly what I needed!

Thank you.