Speeding up Weighted LCA output

ashtx · September 16, 2020, 2:04pm

So far we are very happy with Megan and Diamond workflow. However, we notice that it is taking a lot longer to run (for diamond generated .daa files) “daa-meganizer” with commandline when using “Weighted” LCA algorithm. We have already set Java RAM limit to 100GB using “-Xmx100000M” in Megan.vmoptions. There is also --threads setting which was set to 24 but it rarely uses more than 2 CPUs during the whole process.
For example, it took more than 12 hours to Meganize 5.5 GB .daa file with 24 CPUs and 100GB of ram.
Are we missing something when using Weighted LCA ?

Megan version 6.19.2 built 17 Jun 2020
** Diamond version 0.9.29 **

CPU : Intel® Core™ i9-9940X CPU @ 3.30GHz
RAM : 128GB
OS : Ubuntu 18.04.4 LTS

Thank you.

Daniel · September 25, 2020, 3:01pm

Unfortunately, the weighted LCA does indeed take a long time, because it does two rounds of LCA calculations: it first assigns weights to all references, and then does a second round to use the weights to perform the actual binning… and the current implementation is singled threaded… so unfortunately, adding cores won’t help.