I’d like to ask if query reads can be assigned to more than one “leaves”, lowest category, in SEED analysis.
I’ve obtained taxonomic and functional annotation from tab delimited output file of blastp against nr, using the mapping files of prot_acc2tax-Aug2016.abin(taxonomy), acc2eggnog-June2016X.abin(eggNOG), gi2kegg-Feb2015X.bin(KEGG), and gi2seed-May2015X.bin(SEED).
Then I got CSV outputs showing read names and their functional assignment of lowest hierarchy. Those were obtained by selecting “Tree->Uncollapse All”, “Select->All Leaves”, “File->Export->CSV Format”, and then, for example, “readName_to_seedName”.
Only in the CSV output of SEED analysis, some read names were found in multiple rows.
For example, these lines were found.
gene300865 "Glutamate formiminotransferase (EC 2.1.2.5)"
gene300865 "5-FCL-like protein"
gene303946 "COG2363"
gene303946 “ThiJ/PfpI family protein”
I wonder if this is normal and how I should deal with those reads.
I’m sorry I don’t understand SEED classification well. I heard that some “leaves” are classified to more than one higher hierarchical categories, but reads can be assigned to more than one “leaves”?
And, could you tell me where I can find table of SEED hierarchical category that MEGAN refer to?
(I appreciate for the new CSV output format readName_to_taxonPathPercent. It really helped me!)