I use diamond to blast NGS reads against a database of virus ref_seqs - I then process this file to add taxIDs into the blast hits (I wrote something to do that)- and import into MEGAN - which on the whole works great.
However, there are quite a few reads in the Not Assigned bin - even though the hits are in the taxonomy.
But, my main problem is, I have two reads that hit to Bat hepevirus - taxonID 1216472.
Bat Hepevirus is in the MEGAN taxonomy - I checked when MEGAN loads up initially.
But these two seqs get assigned to the Virus node - not the Bat hepevirus node.
The BLAST hits look like this:
M01569:148:000000000-AH5LR:1:2118:7100:18591/1 gi|400354813|ref|YP_006576507.1|tax|1216472| 78.9 57 12 0 1 171 1390 1446 3.4e-24 109.4 1216472
M01569:148:000000000-AH5LR:1:2118:7100:18591/2 gi|400354813|ref|YP_006576507.1|tax|1216472| 79.7 59 12 0 178 2 1388 1446 2.9e-26 116.3 1216472
Using the Inspector on Viruses to view the seqs gives:
Bat hepevirus; score=116.0
gi|400354813|ref|YP_006576507.1|tax|1216472|
Score = 116, Expect = 3e-26
M01569:148:000000000-AH5LR:1:2118:7100:18591/2 gi|400354813|ref|YP_006576507.1|tax|1216472| 79.7 59 12 0 178 2 1388 1446 2.9e-26 116.3 1216472
Bat hepevirus; score=109.0
gi|400354813|ref|YP_006576507.1|tax|1216472|
Score = 109, Expect = 3e-24
M01569:148:000000000-AH5LR:1:2118:7100:18591/1 gi|400354813|ref|YP_006576507.1|tax|1216472| 78.9 57 12 0 1 171 1390 1446 3.4e-24 109.4 1216472
Any ideas whats wrong? Is the taxon missing from MEGAN taxonomy?
There are relatively few seqs assigned to Viruses node (3 in total, 2 of which are these) - but I also get a load in the Not Assigned which I don’t think should be there.