minPercentIdentity and lcaCoveragePercent

TPM · January 16, 2025, 12:56am

I’m using MEGAN6 CE in the command line. I was wondering if I change the minPercentIdentity parameter, do I also need to change the lcaCoveragePercent parameter?

For example, I’m working on some DNA metabarcoding data for a diet study in ecology. The common parameter used for percent identity in other studies (not using MEGAN) seems to be 94% minimum percent identity to a reference sequence through BLAST. So I assume I should try to set my minPercentIdentity to 94. Do I then need to change the lcaCoveragePercent parameter too? I’ve seen that 80 is recommended for long reads, but my data isn’t long reads (all sequences <150 bp), and in another forum question I saw the developer say that 51% is ok too.

What impact would changing the lcaCoveragePercent parameter have on any classifications? And what would people recommend changing it to, if I should change it? Ideally I’d get down to Genus for most of my classifications, but I’d be happy with Family.

Thanks so much for your help, this is a great forum.

Anupam · January 22, 2025, 1:09pm

Hi @TPM,

The lcaCoveragePercent parameter doesn’t matter if you are performing just the naïve LCA, which I believe is the default for short reads. It only becomes relevant if you intend to use weighted LCA (short reads) or interval union LCA (typically designed for long reads).

Anupam

TPM · January 23, 2025, 1:58am

Hi Anupam,

Thanks for your reply. I was using weighted LCA. So what should I put in for this parameter if I’m using weighted?

Thanks

Anupam · January 30, 2025, 7:57pm

Hi @TPM,

Sorry for the delayed response. If I remember correctly, 80% is the recommended threshold for Weighted LCA. The 51% threshold was changed for Long Reads Interval Union LCA.

However, you can try tuning the threshold for Weighted LCA as needed. I assume the default is 80%.

Best,
Anupam