Taxonomic Breakdown of Functional Assignments?

Greetings MEGAN and friends,

So far in my use of MEGAN I’ve been able to easily view and analyze the “Taxonomic” breakdowns of my samples or the “Functional” breakdowns (I2g, EggNog, etc.). But these are done separately, within different tabs/modules and seem oddly isolated from one another. I can generate PCoA clusters with either taxa or one of the functional modules but these features never seem to mix.

My questions: 1) is it possible to generate a taxonomic breakdown of functional categories within MEGAN?

Example, say I want to know the taxonomic breakdown (what seems to be contributing) to the carbohydrate enzyme family GH43 of the Interpro viewer. Is there currently a way to pull up this information within the current build of MEGAN? Theoretical output: <GH43 Domain Breakdown: 35% Bacteria, 45% Eukaryota, 20% unassigned taxonomically>

Currently, I’m trying to extract the reads in different ways in order to generate something like the theoretical output above. I suspect that the “extract to new document” feature is the answer to this question; selecting the enzyme node of interest and switching to the taxa tab in the new document. But it seems to be computationally expensive, and thus far hasn’t worked (will try again soon). The alternative I’ve been exploring is to “export … csv… ReadID_Taxname”, assuming the read id’s are conserved across all modules of MEGAN, in theory I could also export by functional node within the Interpro viewer and manually discern which taxonomic reads match up with the Interpro reads.

I’m unsure if this belongs within the “User Question” or “Feature Request” category, as this feature may already exist. I certainly have more requests following this general inquiry, but will post these later in the feature request category.



1 Like

Dear Paulson,

it would be nice to have this. However, for short reads I don’t think that taxonomy and function can both be determined with enough certainty to make binning reads to taxon X functions feasible.

For long reads and contigs, taxonomic assignment is much more reliable and so there that does make sense. I will look into this.

@Daniel I had a follow up question on this. We have a large project which have been functionally annotated using KEGG IDs, but we would like to further stratify these based on taxonomic ranks (similar in respect to HUMAnN v3 gene family output ). Not sure if this is possible w/o extracting reads?

I could implement a command-line tool in which one chooses two classifications, e.g. Taxonomy and KEGG, and it will list all pairwise counts, something like:

Pseudoalteromonas sp. BSi20429 <tab> K02833 GTPase HRas <tab> 123

(Of course, with the option to output IDs rather than strings.)

Would that be suitable?

That would be perfect. A consolidated one (from multiple samples) would also be great, but we could essentially do that from individual samples with some scripting.

I have implemented a new tool called taxonomy2function that takes as input an RMA or meganized DAA file and the name of two classifications, such as Taxonomy and EGGNOG, say, and produces a text file as output reporting pairs of classes and the number or names of the assigned reads.
This is available in version 6.21.13, which I am currently uploading.
Please give it a go and let me know whether there are any problems, or any missing features.

Thanks @Daniel, we’ll give it a try and report back!