Getting a tree with accession numbers

OmarKR · December 6, 2022, 4:28am

I’m blasting virus ASVs. at the moment they are all a single species.
my megan input is a default XML blast output. I get a single node for the virus species.
but there’s additional info in the actual blast match name (strain number for example).
that will be useful to look into diversity.

is there a way to get megan to export the actual blast match name. or their accession number.
I understand this could mean that a single read will contribute to multiple names/numbers.
At the moment, having just taxon name and taxonid is too restrictive for my dataset.

thanks.

Daniel · December 6, 2022, 2:32pm

could you supply me with some of the full virus names and I will look into why these are not showing up in the MEGAN analysis

OmarKR · December 7, 2022, 1:09am

Try this dropbox link for the XML input and rma.

youll see from XML (and inspecting reads in megan) that one of the matches is “Human echovirus 19 strain Djum/91 5' non-translated region, partial sequence”. Megan only displays “Human Echovirus” even in exports. Ideally id want at least “Human echovirus 19 strain Djum/91” .

alternatively, if megan can export the accession numbers for the top hits, I can parse that info elsewhere probably.

OmarKR · December 12, 2022, 1:50am

Here’s a screenshot of what im looking for.
Many reads get blast-matched to higher level virus levels, but perhaps megan only displays to ‘species’ level. Even exporting will not give me the full match names, (e.g., id like to show that this sample is assigned to Coxsackievirus A6.)

perhaps a way to parse beyond species level?

OmarKR · December 12, 2022, 2:05am

ah it seems i managed to get Coxsackievirus to show up by increasing the bitscore filter.
however, my original question (getting more details on Human Echovirus), remains.

lucyintheskyzzz · January 30, 2023, 1:58am

Do you know how to export all that information (posted above) to a .csv file? I want to export all my alignment information (family, genus, specie) with bit score, identities, e-values, length. What can I choose in the .csv export dropdown menu to get this information? I don’t need the alignment sequences. Thanks! @Daniel

Daniel · February 2, 2023, 1:27pm

I have looked into this. First, the NCBI taxonomy only goes down Echovirus E19, there are no nodes below that node in the taxonomy. Using Extended Mode and Parse Taxon Names will produce a classification down to the level:

Perhaps this will help you:

Menu item: File → Export → Text…
Select format: readName_to_taxonMatches

This will write out the matches like this:

# read-name taxon-id match-length bit-score percent-identity
ASV1 47507 0 201.0 100.0
ASV1 45101 0 173.0 96.2
ASV1 0 0 173.0 98.0
ASV1 12073 0 171.0 97.1
ASV1 318571 0 171.0 95.4
ASV1 12078 0 169.0 95.3
…

Hope that helps

lucyintheskyzzz · February 8, 2023, 8:38pm

Hi Daniel,

I actually like the way the readName_to_taxonMatches tab separates in Excel. I just with this export gave me the e-value, NCBI accession number, protein name, and assignment name. I am trying to put together a table for my paper that has all this information.

See example below:

Viral Family	Virus Host	Virus Type	Viral Family Contig(s)	Contig Name	Contig Size	Viral Assignment	Protein	NCBI Accession #	aa identity %	e-value	Bit Score	Blastx Hits
Parvoviridae	Vertebrate	ssDNA		tig00000031		Ambidensovirus	NS3, NS1, NS2, structural proteins	KY548840.1	76.052	2.02E-140	512	1

timzladen · January 17, 2024, 1:21pm

Hi all,

Did anyone manage to resolve this somehow? I have a similar issue dealing with viruses.

OmarKR · March 22, 2024, 7:26am

revisiting the same process with a new viral dataset!

previously ive just been using blast.xml files as input into megan.
but i seem to get better resolution when i include a meganmap file. i think the megan-nucl one.

it’s not 100% sorting the viruses. but i do get significantly more reads in lower ranks.