Hello,
I have built Diamond nr database using;
diamond makedb --in nr.gz \
–db nr_diamond
–taxonmap prot.accession2taxid.FULL
–taxonnodes nodes.dmp
–taxonnames names.dmp
–threads 72
and performed annotation using;
diamond blastp -p 80 -c1 -b60
-d nr_diamond.dmnd
-q sample1.spades.genes.faa
-o sample1.faa.diamond.daa
-f 100 -k 5 --salltitles -e 0.000001
However when I try to get taxID’s from the daa file, getting the “Error: Taxonomy features are not supported for the DAA format.” error. I posted the issue on GitHub and got a response indicating that tax info could not be retrieved as the error message indicates.
Then, I have tried to use Megan, but I do not want to use LCA algorithm for my case. Instead, I am trying to get the first annotation out of the five for each, as long as they satisfy the thresholds I supply, but could not set the correct parameters. When I attempted to use a manual scrript, it took so long, and also I do not have the taxon ID info. I need to use “prot.accession2taxid.FULL” file for protein acc and taxon ID matching, but the search for each query takes so long.
How can I get the protein and taxID information (even full lineage information if possible) for the first annotation row for each query on Megan without LCA?
Thank you in advance.
Best regards