Hi MEGAN Community!
I feel this topic has been covered at least partially in previous posts but my lab group is really having a difficult time interpreting and agreeing on what each of the LCA parameters is doing. I have read the manual and I think I have a general impression of what each setting does but it would be nice to get expert and other user validation as well before we begin our analysis in earnest.
We are using MEGAN6 and have processed our short reads through DIAMOND.
In DIAMOND we have an evalue cutoff of .0001 and use the top command to only allow the top 5% and we are comparing against the NCBI nr database
this is our DIAMOND code: diamond blastx -d ~/Diamond_Tools/BLAST_DB/nr_4_23_2018.dmnd -q $pups -o $pups.daa -f 100 --sallseqid --top 5 -b 8 -e .0001 note it is part of a for loop*
My questions are two-fold, the default settings for the LCA algorithm at least in my version are as follows:
Min-score: 50, Max Expected Value: .01, Percent Identity: 0
Top percent: 10%
Min Support percent: .05
I’m wondering how these defaults were determined and the impact of changing them in laymans terms if possible? For example if we set the max expect value very small does that mean we will get less deep taxa hits (species) but the ones we do get will be more reliable? Also if we are already selecting for an evalue in diamond (and potentially a bitscore which I believe is an option) are the first 3 commands redundant?
I’m mostly confused on the Top Percent value. Is this a secondary filter similar to DIAMOND in which it would look at all the bit scores for alignments for one query but only consider the alignments that have a DIAMOND bitscore within 10% of the highest score?
Forgive me if these questions are obvious, but I Appreciate your assistance and guidance!