Getting a tree with accession numbers

I’m blasting virus ASVs. at the moment they are all a single species.
my megan input is a default XML blast output. I get a single node for the virus species.
but there’s additional info in the actual blast match name (strain number for example).
that will be useful to look into diversity.

is there a way to get megan to export the actual blast match name. or their accession number.
I understand this could mean that a single read will contribute to multiple names/numbers.
At the moment, having just taxon name and taxonid is too restrictive for my dataset.

thanks.

could you supply me with some of the full virus names and I will look into why these are not showing up in the MEGAN analysis

Try this dropbox link for the XML input and rma.

youll see from XML (and inspecting reads in megan) that one of the matches is “Human echovirus 19 strain Djum/91 5' non-translated region, partial sequence”. Megan only displays “Human Echovirus” even in exports. Ideally id want at least “Human echovirus 19 strain Djum/91” .

alternatively, if megan can export the accession numbers for the top hits, I can parse that info elsewhere probably.

Here’s a screenshot of what im looking for.
Many reads get blast-matched to higher level virus levels, but perhaps megan only displays to ‘species’ level. Even exporting will not give me the full match names, (e.g., id like to show that this sample is assigned to Coxsackievirus A6.)

perhaps a way to parse beyond species level?

ah it seems i managed to get Coxsackievirus to show up by increasing the bitscore filter.
however, my original question (getting more details on Human Echovirus), remains.

Do you know how to export all that information (posted above) to a .csv file? I want to export all my alignment information (family, genus, specie) with bit score, identities, e-values, length. What can I choose in the .csv export dropdown menu to get this information? I don’t need the alignment sequences. Thanks! @Daniel

I have looked into this. First, the NCBI taxonomy only goes down Echovirus E19, there are no nodes below that node in the taxonomy. Using Extended Mode and Parse Taxon Names will produce a classification down to the level:

Perhaps this will help you:

Menu item: File → Export → Text…
Select format: readName_to_taxonMatches

This will write out the matches like this:

# read-name taxon-id match-length bit-score percent-identity
ASV1 47507 0 201.0 100.0
ASV1 45101 0 173.0 96.2
ASV1 0 0 173.0 98.0
ASV1 12073 0 171.0 97.1
ASV1 318571 0 171.0 95.4
ASV1 12078 0 169.0 95.3

Hope that helps

1 Like

Hi Daniel,

I actually like the way the readName_to_taxonMatches tab separates in Excel. I just with this export gave me the e-value, NCBI accession number, protein name, and assignment name. I am trying to put together a table for my paper that has all this information.

See example below:

Viral Family Virus Host Virus Type Viral Family Contig(s) Contig Name Contig Size Viral Assignment Protein NCBI Accession # aa identity % e-value Bit Score Blastx Hits
Parvoviridae Vertebrate ssDNA tig00000031 Ambidensovirus NS3, NS1, NS2, structural proteins KY548840.1 76.052 2.02E-140 512 1

Hi all,

Did anyone manage to resolve this somehow? I have a similar issue dealing with viruses.

revisiting the same process with a new viral dataset!

previously ive just been using blast.xml files as input into megan.
but i seem to get better resolution when i include a meganmap file. i think the megan-nucl one.

it’s not 100% sorting the viruses. but i do get significantly more reads in lower ranks.