Exporting #. of assigned reads as csv

Hi Daniel,

We are trying to export the number of assigned reads for particular taxon to csv format, however, it seems that the output is always summarized counts. We can visually see the assigned numbers in the GUI, which differs from the summarized. The problem comes during the export, the numbers doesn’t match.

Procedure:
Highlight taxon level we wish to export
File -> Export -> csv format -> taxonPath_to_count (also tried taxonName_to_count) -> assigned -> tab -> filename

I am not sure why this is so and it seems to happen for both MEGAN5 (v5.11.3) and MEGAN6 (v6.4) versions.

Kindly advise.

Thanks for your help!

Maria

If you request assigned counts, then MEGAN exports assigned counts, unless the node is collapsed, in which case the summarized count is exported. To avoid the latter, uncollapse any such node for which you don’t want this to kick in

1 Like

Dear Daniel, I also have a question on this topic. in my original library the node that I am exporting there are sum=27355 and ass=12158, however when I open the extracted file (rma) it change to sum=12144 and ass=2571. if I export fast all the 27355 sequences are in the file? so what should I change to have all de 27355 sequences in the exported .rma file?

That doesn’t sound right
Can you give me more details?

Hi Daniel,

Is there any way to export everything from the inspector to a .csv file (see example below)?

Anelloviridae [1]
tig00000237 [matches=9]
DATA
Lesser panda anellovirus; score=56.0
Seal anellovirus 4; score=54.0
Seal anellovirus 4; score=51.0
Torque teno midi virus 12; score=51.0
Torque teno midi virus 7; score=47.0
Seal anellovirus 4; score=35.0
Lesser panda anellovirus; score=31.0
Torque teno midi virus 11; score=29.0
Simian torque teno virus 31; score=23.0

In the inspector window you can select some taxa and then use the File->Export Selected Taxa… menu item.

But probably the better way to export counts is to use the File->Export → Text (CSV) Format… menu item.
This allows you to select nodes in the taxonomy (or a functional) viewer and then to export counts, reads, etc in several different ways.

ok thank you Daniel!

Also, after I set my LCA parameters I get some viruses bolded in black lettering and some in light grey lettering, what does this mean?

Hi Daniel,

I was able to change the text file to a .csv file and I was able to extract select viral family FASTA files by right clicking the node on the tree and clicking “extract.” I was wondering now if there is a way to select a node and extract all the NCBI accession numbers? I only see the accession numbers listed on the inspector when you expand a node to see alignments. Is there a easier way to extract accession numbers? Thanks! Katie

This is where I see the accession numbers from the different nodes in the inspector window.

>YP_009336956.1 hypothetical protein 2 [Hubei tombus-like virus 11]
Length = 465

Score = 69 bits (166), Expect = 7e-11
Identities = 49/129 (38%), Positives = 68/129 (53%), Gaps = 11/129 (9%)
Frame = -3

Query: 386 RANPRGTPSHQVVPHPLPTPGSFEVHVHNNCLCNEYLSLRNRVLQQVPEP-LDT-----FVDEMRNLAHRVSTWLGKHTPSDGEWIQQYSGRKATMYRNAAADLMLVPFSRRDRYIKSFL 45
R R TP Q P G ++V ++ +E +SLRNR+L +P+P L+T FV EMR+L H+V T + + I +Y+G K T Y AA LM P ++RD YI FL
Sbjct: 4 RDKTRATPWKQYCFKSFP–GWYKVDYPSSTYIDEEVSLRNRILLPMPQPQLNTPQWLSFVREMRHLKHQVPI—VETLTRQQVILKYTGAKRTRYEKAAISLMTKPLNKRDSYIDCFL 118

Query: 44 KPEKISDPT 18
K EK+ T
Sbjct: 119 KVEKMPHET 127

I have an update. Looks like I can’t export all the blastx alignments with e-value, % identity etc. from the inspector. However, I was able to export this information in .csv from highlighting all my viral families on the taxonomic tree. When you export from the inspector and change the .txt file to a .csv file all the data is still organized in one column making it hard to clean up for data analysis in R, so exporting from the taxonomic tree is the way to go. I am still in the process of learning how to visualize this same data using Diamond. I need to figure out the fastest way to create a table in excel with my virus family, genus & specie alignment, %identity, e-value, and protein aligned to my contigs, so I can create figures in R. Thanks!

@daniel What is taxon id?

# read-name taxon-id match-length bit-score percent-identity
tig00000237 2012640 0 56 37.8
tig00000237 1566011 0 54 32.7
tig00000237 1566011 0 51 32.6
tig00000237 2065053 0 51 32.2
tig00000237 2065048 0 47 31.7
tig00000237 1566011 0 35 50
tig00000237 2012640 0 31 22.2
tig00000237 2065052 0 29 27.1
Is it possible to get all the information in this file with accession numbers, and e-values?

Gray means that the bitscore for the alignment is more than 10% lower than the best bitscore for the read, and such alignments are not taken into account during analysis. This value is the “topPercent” threshold.

Here is an example of all the data I would like exported from MEGAN.

Viral Family Host Virus Type Contig Name Contig Size Viral Assignment Protein NCBI Accession # aa identity % e-value Bit Score # Viral Family Contigs Blastx Hits
Anelloviridae Vertebrate ssDNA tig00000237 1257 Lesser panda anellovirus ORF1 >YP_009551687.1 38 2.00E-10 56 1 4
Anelloviridae Vertebrate ssDNA tig00000237 Seal anellovirus 4 ORF1 >YP_009115496.1 33 4.00E-11 54
Anelloviridae Vertebrate ssDNA tig00000237 Seal anellovirus 4 ORF2 >YP_009115496.1 33 8.00E-06 51
Anelloviridae Vertebrate ssDNA tig00000237 Torque teno midi virus 12 hypothetical protein >YP_009505786.1 32 3.00E-07 51

It would be nice if MEGAN can export all this in text(CSV), so I don’t have to do it manually in excel. Thanks again for your help!

Thanks for the information, Daniel!

Hi Daniel,
Is there a way to export all a taxonomic information via command line version of MEGAN or Windows version that gives me a .tsv or .csv file similar to DIAMOND? Diamond produces a perfect .tsv table from all my blastx hits against the NCBI database, but did not provide order, subfamily, family, genus or species, so I ended up meganizing my blastx via Diamond into a .daa file and uploaded it to MEGAN, but I still can’t figure out how to generate the same table as Diamond with all the taxonomic information from the NCBI database. Can MEGAN do this? Below is an example

qseqid sseqid pident length mismatch evalue bitscore staxids sscinames sskingdoms skingdoms sphylums stitle
95358305-35eb-424c-9f2b-800110c28af5 YP_004894356.1 32.1 78 53 1.21E-05 46.6 1094892 1094892 0 0 0 YP_004894356.1 DNA primase [Megavirus chiliensis]