Exporting #. of assigned reads as csv

mariayungpy · June 3, 2016, 5:23am

Hi Daniel,

We are trying to export the number of assigned reads for particular taxon to csv format, however, it seems that the output is always summarized counts. We can visually see the assigned numbers in the GUI, which differs from the summarized. The problem comes during the export, the numbers doesn’t match.

Procedure:
Highlight taxon level we wish to export
File -> Export -> csv format -> taxonPath_to_count (also tried taxonName_to_count) -> assigned -> tab -> filename

I am not sure why this is so and it seems to happen for both MEGAN5 (v5.11.3) and MEGAN6 (v6.4) versions.

Kindly advise.

Thanks for your help!

Maria

Daniel · June 3, 2016, 5:08pm

If you request assigned counts, then MEGAN exports assigned counts, unless the node is collapsed, in which case the summarized count is exported. To avoid the latter, uncollapse any such node for which you don’t want this to kick in

jana_rigonato · February 9, 2017, 3:17pm

Dear Daniel, I also have a question on this topic. in my original library the node that I am exporting there are sum=27355 and ass=12158, however when I open the extracted file (rma) it change to sum=12144 and ass=2571. if I export fast all the 27355 sequences are in the file? so what should I change to have all de 27355 sequences in the exported .rma file?

Daniel · February 28, 2017, 9:43am

That doesn’t sound right
Can you give me more details?

lucyintheskyzzz · January 16, 2023, 5:13pm

Hi Daniel,

Is there any way to export everything from the inspector to a .csv file (see example below)?

Anelloviridae [1]
tig00000237 [matches=9]
DATA
Lesser panda anellovirus; score=56.0
Seal anellovirus 4; score=54.0
Seal anellovirus 4; score=51.0
Torque teno midi virus 12; score=51.0
Torque teno midi virus 7; score=47.0
Seal anellovirus 4; score=35.0
Lesser panda anellovirus; score=31.0
Torque teno midi virus 11; score=29.0
Simian torque teno virus 31; score=23.0

Daniel · January 17, 2023, 4:04pm

In the inspector window you can select some taxa and then use the File->Export Selected Taxa… menu item.

But probably the better way to export counts is to use the File->Export → Text (CSV) Format… menu item.
This allows you to select nodes in the taxonomy (or a functional) viewer and then to export counts, reads, etc in several different ways.

lucyintheskyzzz · January 18, 2023, 7:34pm

ok thank you Daniel!

Also, after I set my LCA parameters I get some viruses bolded in black lettering and some in light grey lettering, what does this mean?

lucyintheskyzzz · January 22, 2023, 1:37am

Hi Daniel,

I was able to change the text file to a .csv file and I was able to extract select viral family FASTA files by right clicking the node on the tree and clicking “extract.” I was wondering now if there is a way to select a node and extract all the NCBI accession numbers? I only see the accession numbers listed on the inspector when you expand a node to see alignments. Is there a easier way to extract accession numbers? Thanks! Katie

This is where I see the accession numbers from the different nodes in the inspector window.

>YP_009336956.1 hypothetical protein 2 [Hubei tombus-like virus 11]
Length = 465

Score = 69 bits (166), Expect = 7e-11
Identities = 49/129 (38%), Positives = 68/129 (53%), Gaps = 11/129 (9%)
Frame = -3

Query: 386 RANPRGTPSHQVVPHPLPTPGSFEVHVHNNCLCNEYLSLRNRVLQQVPEP-LDT-----FVDEMRNLAHRVSTWLGKHTPSDGEWIQQYSGRKATMYRNAAADLMLVPFSRRDRYIKSFL 45
R R TP Q P G ++V ++ +E +SLRNR+L +P+P L+T FV EMR+L H+V T + + I +Y+G K T Y AA LM P ++RD YI FL
Sbjct: 4 RDKTRATPWKQYCFKSFP–GWYKVDYPSSTYIDEEVSLRNRILLPMPQPQLNTPQWLSFVREMRHLKHQVPI—VETLTRQQVILKYTGAKRTRYEKAAISLMTKPLNKRDSYIDCFL 118

Query: 44 KPEKISDPT 18
K EK+ T
Sbjct: 119 KVEKMPHET 127

lucyintheskyzzz · January 28, 2023, 4:01pm

I have an update. Looks like I can’t export all the blastx alignments with e-value, % identity etc. from the inspector. However, I was able to export this information in .csv from highlighting all my viral families on the taxonomic tree. When you export from the inspector and change the .txt file to a .csv file all the data is still organized in one column making it hard to clean up for data analysis in R, so exporting from the taxonomic tree is the way to go. I am still in the process of learning how to visualize this same data using Diamond. I need to figure out the fastest way to create a table in excel with my virus family, genus & specie alignment, %identity, e-value, and protein aligned to my contigs, so I can create figures in R. Thanks!

lucyintheskyzzz · January 30, 2023, 2:01am

@daniel What is taxon id?

# read-name	taxon-id	match-length	bit-score	percent-identity
tig00000237	2012640	0	56	37.8
tig00000237	1566011	0	54	32.7
tig00000237	1566011	0	51	32.6
tig00000237	2065053	0	51	32.2
tig00000237	2065048	0	47	31.7
tig00000237	1566011	0	35	50
tig00000237	2012640	0	31	22.2
tig00000237	2065052	0	29	27.1
Is it possible to get all the information in this file with accession numbers, and e-values?

Daniel · February 2, 2023, 11:25am

Gray means that the bitscore for the alignment is more than 10% lower than the best bitscore for the read, and such alignments are not taken into account during analysis. This value is the “topPercent” threshold.

lucyintheskyzzz · February 8, 2023, 8:38pm

Here is an example of all the data I would like exported from MEGAN.

Viral Family	Host	Virus Type	Contig Name	Contig Size	Viral Assignment	Protein	NCBI Accession #	aa identity %	e-value	Bit Score	# Viral Family Contigs	Blastx Hits
Anelloviridae	Vertebrate	ssDNA	tig00000237	1257	Lesser panda anellovirus	ORF1	>YP_009551687.1	38	2.00E-10	56	1	4
Anelloviridae	Vertebrate	ssDNA	tig00000237		Seal anellovirus 4	ORF1	>YP_009115496.1	33	4.00E-11	54
Anelloviridae	Vertebrate	ssDNA	tig00000237		Seal anellovirus 4	ORF2	>YP_009115496.1	33	8.00E-06	51
Anelloviridae	Vertebrate	ssDNA	tig00000237		Torque teno midi virus 12	hypothetical protein	>YP_009505786.1	32	3.00E-07	51

It would be nice if MEGAN can export all this in text(CSV), so I don’t have to do it manually in excel. Thanks again for your help!

lucyintheskyzzz · February 8, 2023, 8:39pm

Thanks for the information, Daniel!

lucyintheskyzzz · October 22, 2023, 8:26pm

Hi Daniel,
Is there a way to export all a taxonomic information via command line version of MEGAN or Windows version that gives me a .tsv or .csv file similar to DIAMOND? Diamond produces a perfect .tsv table from all my blastx hits against the NCBI database, but did not provide order, subfamily, family, genus or species, so I ended up meganizing my blastx via Diamond into a .daa file and uploaded it to MEGAN, but I still can’t figure out how to generate the same table as Diamond with all the taxonomic information from the NCBI database. Can MEGAN do this? Below is an example

qseqid	sseqid	pident	length	mismatch	evalue	bitscore	staxids	sscinames	sskingdoms	skingdoms	sphylums	stitle
95358305-35eb-424c-9f2b-800110c28af5	YP_004894356.1	32.1	78	53	1.21E-05	46.6	1094892	1094892	0	0	0	YP_004894356.1 DNA primase [Megavirus chiliensis]