How many diamond target hits are needed for MEGAN?

Katharina_K · March 31, 2023, 10:12am

Hei,

I want to use DIAMOND+MEGAN for a sequence dataset of 80 metagenomes. I have run a test with one file using diamond blastx + meganizer and opened the file in MEGAN.
Everything worked beautifully, but the produced diamond output file is very large, even after reducing --max-target-seqs to 10 (~25GB). While I can run the analysis on a cluster, I will have to view all the files in MEGAN on my own computer, so downloading opening 80 25GB files doesn’t seem like a viable option.
Would it be ok to reduce --max-target-seqs even more, to 5 or maybe even 1? Or is there an alternative way to process the diamond output for MEGAN?

Thanks,
Katharina

Daniel · April 12, 2023, 2:34pm

No, please do not reduce the number of alignments per read. This will result in a large increase in false positive assignments.
There are a number of alternatives.

you can consider running megan/tools/megan-server on your cluster. This will serve megan files to MEGAN for you via HTML. Please see our recent paper on this: https://academic.oup.com/bioinformatics/article/39/3/btad105/7056641
You can consider using a smaller database, for example AnnoTree, see this paper: https://academic.oup.com/bioinformatics/article/39/3/btad105/7056641
We are working on new releases of MEGAN that will work with UniRef50, UniRef90 and UniRef100, and with clustered versions of NR
You can use the program megan/tools/compute-comparison to compute a single, small comparison file for all your 80 files. You can download and open that.

Katharina_K · April 13, 2023, 9:24am

Thanks for the tips.
I have played around a bit with different numbers and noticed that I end up with way too many unassigned reads if I reduce the number of alignments too much. It did seem to stabilize at 25-100 alignments per read, so I’m now going with 50 and the compute-comparison tool.

Katharina

Daniel · April 13, 2023, 9:43am

I assume that you mean that the number of functional assignment decreases? Taxonomic assignments shouldn’t be effected. The nice thing about using the smaller databases (as soon as the become available, we are nearly there) is that the assignment rate goes up for functional assignments, whereas the time to compute the alignments goes down…

Katharina_K · April 13, 2023, 10:00am

No, it also decreased for the taxonomic assignment.
For my comparison using 1, 10, 25, 50 and 100 max alignments I got ~22,000,000, ~7,400,000, ~5,000,000, ~4,700,000 and ~4,700,000 reads in the “not assigned” category, so quite a dramatic difference.

Daniel · April 13, 2023, 10:34am

Ok - that makes sense- unassigned is not unaligned. So, this indicates that there is a mapping problem - you are using a reference database that is much more recent than the mapping file. I will generate a new mapping file that is up-to-date.

Katharina_K · April 13, 2023, 3:48pm

Sorry for the confusion. I am using the nr reference database on our computing cluster, so I am not exactly sure how recent that is. As I have now already run diamond followed by daa-meganizer on some of the files, will it be possible to run meganizer again on the already meganized files with an updated mapping file, or do I have to generate a fresh daa-file for that?

Daniel · April 14, 2023, 10:35am

You will be able to remeganize the DAA files with the new mapping db.