80% of taxonomy not assigned

MinxiJiang · December 16, 2021, 10:41pm

Hi there,

Sincerely hope to get some help with this question.
I have done blastx using Diamond to reference the CAZy database and generated the aligned daa files. (there are around 6400 aligned). However, when meganized the daa files and mapped them using the megan-map-Jan2021.db.zip I could see a huge number of not assigned (around 80%) of the taxonomy. I have double-checked the output from my blastx, they do include the accession number and I could find them on the NCBI nr database.

I wondered, whether it is because of my own data problems or if there was anything wrong with my analysis process. Should I map using the accession options? Are there other options that I could probably confirm these results? Also, I used the default LCA parameters with a min score of 50. I have decreased it to 10, but my results seem not changed.

Best,
Minxi

Daniel · January 20, 2022, 4:32pm

I will look into aligning against the CAZy database. The issue is probably that the mapping file that you are using is optimized for NCBI-nr. You could try downloading and using the expanded mapping file:
https://software-ab.informatik.uni-tuebingen.de/download/megan6/megan-map-Jul2020-2-X.db.zip

This expands to a huge size, but does contain all accessions mentioned in NCBI-nr, not just the first one for each sequence.