While running blast2lca (version 6.8.13, built 24 Jun 2017) with default parameters I end up having taxonomies with missing ranks. For example the two following case don´t show the family rank ???
k141_17708_1; ;d__Bacteria; 100;p__Firmicutes; 100;c__Bacilli; 100;o__Bacillales; 100;g__Exiguobacterium; 50;s__Exiguobacterium enclense; 25;
k141_32800_1; ;d__Bacteria; 100;p__Actinobacteria ; 100;c__Actinobacteria; 100;o__Corynebacteriales; 100;g__Lawsonella; 100;s__Lawsonella clevelandensis; 100;
What´s wrong ?
Ok, figureout they don´t have a family rank at ncbi taxonomy. Would it be possible to generate an empry “f__ unknown” ??
Good idea, I have just implemented this feature. Missing ranks will be reported as “unknown” whenever MEGAN’s CSV export data “readName_to_taxonPathPercent” is selected. Also, this feature is active in rma2info and daa2info when using the command line options --paths and --majorRanksOnly.
I will upload the new release later this week.
When we ran the rma2info command on a .rma6 file we got a different taxonomy than when we looked at the OTU within the GUI application and selected a certain rank.
For example, we ran the rma2info command and got this taxonomy:
DUP_10303;size=37 [SK] Eukaryota; [P] unknown; [C] unknown; [O] Pavlovales; [F] Pavlovaceae; [G] Pavlova;
While if I look at that same OTU within the .rma6 file I can select to group by Class and MEGAN6 puts this OTU under “Haptophyceae” (see attached image).
Do you know what might be going on? The commands we used are below.
./blast2rma --in 18S_anni_10_BLASTed.xml --format BlastXML --blastMode BlastN --out meganfile.rma6 --minScore 140 --maxExpected 1e-25 --topPercent 3 --minSupport 1 --lcaAlgorithm naive --lcaCoveragePercent 80
./rma2info --in meganfile.rma6 --read2class Taxonomy --paths --majorRanksOnly > taxa.txt
The “Haptophyceae” taxon does not have an official rank, that it is why it doesn’t appear when you use rma2info using official ranks only.
When to uncollapse to Class in MEGAN then this node appears by virtue of the fact that it is a sibling of other nodes that are ranked Class
I came across a similar problem. When I export my data as biom file (using officialRanksOnly=true) some entries miss a rank (like family is missing in the example below).
1378 d__Bacteria p__Firmicutes c__Bacilli o__Bacillales g__Gemella NA
Is it possible to introduce a NA in the missing rank as well because the output causes some trouble with downstream analysis.