Contribution of species to KEGG KOs


I wonder if it is possible to easily extract in MEGAN combined taxonomic and functional information. In other words is it possible to extract e.g. KEGG KOs identified in a sample together with the corresponding taxa it belongs (e.g. NCBI).

Right now, I can obtain the relative abundance at the community level, but I wonder whether it is also possible to obtain the contribution for KEGG KOs from known and unknown species?

For example, in HUMAnN you get gene family information that also contains for each functional unit details about the taxa it can be assigned to:

see HUMAnN gene families output file. Example:

# Gene Family	$SAMPLENAME_Abundance-RPKs
UNMAPPED        187.0
UniRef50_unknown        150.0
UniRef50_unknown|g__Bacteroides.s__Bacteroides_fragilis 150.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon	67.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|g__Bacteroides.s__Bacteroides_fragilis	57.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|g__Bacteroides.s__Bacteroides_finegoldii	5.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|g__Bacteroides.s__Bacteroides_stercoris	4.0
UniRef50_A6L0N6: Conserved protein found in conjugate transposon|unclassified	1.0
UniRef50_O83668: Fructose-bisphosphate aldolase	60.0
UniRef50_O83668: Fructose-bisphosphate aldolase|g__Bacteroides.s__Bacteroides_vulgatus	31.0
UniRef50_O83668: Fructose-bisphosphate aldolase|g__Bacteroides.s__Bacteroides_thetaiotaomicron	22.0
UniRef50_O83668: Fructose-bisphosphate aldolase|g__Bacteroides.s__Bacteroides_stercoris	7.0
  • This file details the abundance of each gene family in the community. Gene families are groups of evolutionarily-related protein-coding sequences that often perform similar functions.
  • Gene family abundance at the community level is stratified to show the contributions from known and unknown species. Individual species’ abundance contributions sum to the community total abundance.

Thank you! Best regards, Bernhard

There is a program in the tools directory called taxonomy2function that addresses this task.

I can add that this worked quite well for us, though I believe it works on each sample. I don’t think it works from comparison mode with multiple samples (or at least it didn’t when we first tried it, maybe it does now!)

No, it needs access to the reads and their classifications, which is not present in a comparison file