Can I get "Taxonomic Paths" file in MEGAN6?

keima · July 29, 2016, 4:22pm

Hello,
I have used MEGAN5, and now I am trying MEGAN6.
I prefer the text file exported by “File→Export→Taxonomic Paths” in MEGAN5. The exported file contains readnames, taxonomic paths and LCA of the taxa.
But I can’t find the function in MEGAN6. Can’t MEGAN6 export the “taxonomic path”?

Daniel · August 3, 2016, 10:02am

Have you explored the File->Export->Export CSV… menu item?
You will find a number of export options that involve “TaxonPath”.
For example, TaxonPath_to_count leads to this kind of output:

group;Bacteroidetes;Flavobacteriia;" 1906
“root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Bacteroidetes;Sphingobacteriia;” 66
“root;cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Chlorobi;Chlorobia;” 205
“root;cellular organisms;Bacteria;Fusobacteria;Fusobacteriia;” 13
“root;cellular organisms;Bacteria;Nitrospirae;Nitrospira ;” 13
“root;cellular organisms;Bacteria;Proteobacteria;Acidithiobacillia;” 7
“root;cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;” 239
“root;cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;” 88

Or try ReadName_to_TaxonPath for this type of output:

GDEG1CX11GSH3L.2 “root;cellular organisms;Bacteria;Aquificae ;Aquificae;”
GDEG1CX11GTL0G.2 “root;cellular organisms;Bacteria;Aquificae ;Aquificae;”
GDEG1CX11GX5HC.1 “root;cellular organisms;Bacteria;Aquificae ;Aquificae;”
GDEG1CX11GXWU5.2 “root;cellular organisms;Bacteria;Aquificae ;Aquificae;”
GDEG1CX11G0GQ0.2 “root;cellular organisms;Bacteria;Aquificae ;Aquificae;”

Please let me know precisely which variant of output that is missing…

keima · August 5, 2016, 7:32am

Dear Daniel

Thank you very much for your quick reply.
Actually, I wanted the “readName_to_taxonPath” file. I’m sorry, I just missed the “File->Export->Export CSV” menu. Such text files of taxonomy and function enables me to see the results flexibly. Thank you.

However, the “readName_to_taxonPath” I got in MEGAN6 didn’t contain weighted LCA%, though “Taxonomic Paths” file in MEGAN5 contains LCA% of each taxa. The exported file that contained taxonomic path and LCA% was very useful, because I could quickly check the taxonomic contents with various LCA% threshold (by using R software, for example). Can I get exported file that contains “read name”, “taxonomic path”, and “weighted LCA%” in MEGAN6?

yueh · October 7, 2016, 1:43am

Hi Daniel,
The “readName_to_taxonPath” is what I need.
I am using MEGAN to parse the BLAST output table.
I want to annotate a fasta file (sequences are all singletons), I have done BLAST search on NCBI reference sequence db (refseq db) and imported the BLAST ouput table together with NCBI refseq taxa map file to MEGAN6.
I thought it should be easy to draw the taxonomies (after LCA) for each query sequences. However, I always get 0 line be written to a txt… How could I successfully export the read names and taxonomic names ?

— after 5mins
Hi again, I think the problem is solved. I didn’t select all nodes before exporting.

Daniel · October 7, 2016, 3:26pm

I’ve added a new CSV output format readName_to_taxonPathPercent that contains the percentages

keima · November 28, 2016, 11:28am

I really appreciate the CSV output format “readName_to_taxonPathPercent”.
But I have 2 questions I’d like to ask you.

First, the values shown in the CSV output are the “weighted LCA%”?
When I import my blast output file, I checked “Use weighted LCA” and set the value to 80.
Then the below is the part of CSV output files of “readName_to_taxonPath” format and “readName_to_taxonPathPercent” format.
Like geneA and geneB, the values in the “readName_to_taxonPathPercent” output seemed irrelevant to the assignment.

Second, what makes difference of taxonomic paths between “readName_to_taxonPath” and “readName_to_taxonPathPercent”?
Like geneC, taxonomic path in the “readName_to_taxonPath” seems to correspond to the tree in the taxonomic viewer. But taxonomic path in the “readName_to_taxonPathPercent” lacks some nodes.

--------------“readName_to_taxonPath”--------------
geneA "root;cellular organisms;Bacteria;"
geneB "root;cellular organisms;Bacteria;Proteobacteria;Alphaproteobacteria;Pelagibacterales;Pelagibacteraceae;Candidatus Pelagibacter;Candidatus Pelagibacter ubique;"
geneC “root;Viruses;dsDNA viruses, no RNA stage;Caudovirales;Myoviridae;Tevenvirinae;Schizot4virus;”

--------------“readName_to_taxonPathPercent”-----------------
geneA d__Bacteria; 99;p__Proteobacteria; 83;c__Gammaproteobacteria; 82;o__Pseudomonadales; 80;f__Pseudomonadaceae; 80;g__Pseudomonas; 80;s__Pseudomonas syringae group; 59;s__Pseudomonas syringae; 41;
geneB d__Bacteria; 100;p__Proteobacteria; 100;c__Alphaproteobacteria; 100;o__Pelagibacterales; 100;f__Pelagibacteraceae; 100;g__Candidatus Pelagibacter; 50;s__Candidatus Pelagibacter ubique; 50;
geneC o__Caudovirales; 100;f__Myoviridae; 100;g__Schizot4likevirus; 100;s__Vibrio phage VH7D; 33;

Daniel · November 28, 2016, 4:07pm

readName_to_taxonPathPercent

has a number of additional features not seen in

readName_to_taxonPath

It has a percent value at the end of the line. This refers to the percentage of high scoring alignments for the given read that map to the last taxon on the path. It has nothing to do with the percentage used in the weighted LCA.
It only reports taxa in the path that have an official KPCOFGS rank. Intermediate nodes that have no taxonomic rank, or one that does not belong to KPCOFGS, are suppressed
Each node is prefixed by letter__ to indicate the rank, e.g. g__ for genus, s__ for species

D

keima · November 30, 2016, 7:27am

Dear Daniel

Thank you for your quick reply.

I’m still a little confused.
So the percent value in the “readName_to_taxonPathPercent” correspond to “naive LCA” used in MEGAN5? Or the value has nothing to do with that, too? How should I think about the value in taxonPathPercent?

Daniel · November 30, 2016, 10:16am

The taxon path percent is calculated independent of whichever algorithm was used to bin reads (be it the naive LCA or the weighted LCA).

For a given read r and taxonomic node t, the percentage is defined as follows:
Let N be the number of matches for r that pass all filter criteria (such as minScore, maxExpected, minPercentIdentity and topPercent) and let K be the number of such matches that correspond to taxon t, the percentage is given by 100*(K / N).

So, for a read that is assigned to a species node s, say, using the naive LCA, this percentage will be 100% for all nodes from s all the way up to the root (because otherwise the naive LCA would have placed the read higher).

For a read that is assigned to a phylum node p, say, using the naive LCA, all nodes above will show 100%, while all nodes below will show less than 100% (because otherwise the read would have been placed lower).

keima · December 5, 2016, 6:42am

Dear Daniel

Thank you for the detailed explanation!
Now it becomes clear to me.

IrbinVeliz · December 10, 2021, 9:54am

Hello! This is my first time using MEGAN. I am using version 6.21.16 and I would like to export my data with the percentages of each rank. However, I cannot find this option in the new version. I have only managed to get the path for each OTU. Is there any way to do it? Thank you very much.

Daniel · January 20, 2022, 4:27pm

By default, MEGAN does not calculate “percentages” for each rank, because the naive LCA places each read above all organisms to which it aligns, so in that sense, the percentage is 100% for all nodes on the path to the placement of a read.
Alternatively, you could use the “weighted LCA” that places a read above 80% of all taxa to which it aligns, but the algorithm is quite slow and will struggle with large datasets.