Connecting taxonomic assignments and SEED groupings

lebrune · June 21, 2017, 10:57pm

I apologize if this is a silly question in advance as I am new to using MEGAN. I am doing an analysis on a metagenome dataset where I am interested primarily in function and so I am doing a SEED analysis (would prefer KEGG but I am using MEGAN 6 community edition). I wish to pull out the identity of the taxa represented in a SEED node. Is there any streamlined way to do this? The best way that I can figure right now is to extract the reads associated with the node and do another taxonomic assignment to those reads. As taxonomic analysis was already done at the time of the SEED analysis, it seems like there should be a way to tie the two together and I am just missing it.

Thank you in advance,
Erick

Daniel · June 24, 2017, 8:24am

MEGAN doesn’t directly connect across different classifications.

Select the SEED nodes of interest and then use Extract to New Document to create a new document that only contains the reads that have been placed on the selected nodes. You can then study the taxonomic assignments of the reads.

lebrune · June 27, 2017, 10:42pm

Thank you Daniel. That is a reasonable enough way of going about it. I appreciate it!

lebrune · June 28, 2017, 12:47am

Hello Daniel, I am attempting to do this but I get a “No Reads Extracted” warning. If I simply do “Extract reads” it works fine. I checked the manual and it said Extract to New Document only works on .daa and .rma6 files but I am using a .daa file so I am not sure wha the issue could be? Any advice is much appreciated.

Daniel · June 29, 2017, 9:06am

Could you please snow me a screen shot of the the nodes that you selected before attempting to use the command.

lebrune · June 29, 2017, 6:56pm

I think I figured it out. It looks like if any leaves are open in the viewer below the node you are trying to extract to a new document, it does not work. If I collapse everything below that node, it works. Interesting quirk, but I wouldn’t have figured it out without your help. Thank you.

Lmac · November 24, 2017, 6:25pm

Hello Daniel. When opening the new document (after extracting a selected KEGG/SEED node to a new document), I effectively obtain the good number of reads assigned to this specific function but also reads associated in other categories. Is it due to other hits ? In this case, how can I keep only the best hit for function annotation without interfering with LCA annotation for taxonomy? Thank you very much.

Daniel · November 24, 2017, 6:28pm

This is probably due to the fact that a single KO (or SEED functional role) can belong to a number of different pathways (similar for SEED subsystems)

Eric · September 11, 2018, 6:18am

Hi Daniel,

Is there a possibility to also extract the number of reads? Eg. how many times a certain sequence pops up, similar to an OTU table.

Many thanks!
Cheers,
Eric

Daniel · September 19, 2018, 11:44am

Use File->Export->As Text to do this interactively

cjfields · February 25, 2019, 9:39pm

I’m trying to do something analogous to this for KEGG categories, but at a higher level (the BRITE category) and am seeing what looks like additional functions getting included, is this normal?

For example I found that ko02035 (Bacterial motility proteins) seems to appear in a simple enrichment analysis when assessing features that are changing; this is one level higher than the leaves (KEGG orthology IDs). If I try to extract those hits to a new document I see other KEGG categories pop up (note k02030 and k02040 for example):

Daniel · February 26, 2019, 12:15am

That is normal, because the same Kegg orthology groups appear in different ko pathways. For example, in your example the fattest leaf K03406(?) appears twice.

cjfields · February 26, 2019, 12:45am

Ah yes, you are correct! I think I had this in mind due to the rankings when exported into a BIOM file (which are simpler). Thanks @Daniel!

joyce2896 · February 6, 2020, 9:47am

Hi Daniel,

I’m trying to use “extract to new document” features after selecting a main node in SEEDS. The process is taking extremely long (I dont think the process is completed after overnight waiting), and i cant obtain any reads assigned to that specific function (i saw the function.rma file but when i open it there’s 0 read counts). I’m currently using version 6.15. May i know what could I’ve had done wrong? Besides, when i’m trying to inspect the nodes, I could only see the taxon/function name without the reads sequence.

Thank you in advance.
Joyce

Daniel · February 14, 2020, 7:25am

Hi Joyce,

extract to new document can be slow, but shouldn’t take over night.
How many reads are in the original document, how many reads are you trying to extract and how much memory have you given MEGAN (more memory means higher speed…)

Do you have an RMA file or a meganized DAA file? If it is an RMA file and you can’t see the reads then the most likely explanation is that you didn’t provide the reads file during creation of the RMA file.

joyce2896 · February 19, 2020, 9:12am

Dear Daniel,

Ya, i manged to solved the problem. Indeed higher RAM is needed for the extract, previously we were using 6GB, now we are using 12GB given on MEGAN. Besides, the data was stored on server, so it took time for MEGAN to retrieve, it works once i get the DAA file down to my laptop. I used meganized DAA file. Guess i need to extract the document individually and compare them across sample, because if i extract the “compared file”, i’m unable to tell the taxon/function is from which sample. Thanks for the help Daniel!

carolinnerdc · September 18, 2020, 10:46pm

Thank you for telling what you did. It only worked for me the way you described.