Exporting #. of assigned reads as csv

Hi Daniel,

We are trying to export the number of assigned reads for particular taxon to csv format, however, it seems that the output is always summarized counts. We can visually see the assigned numbers in the GUI, which differs from the summarized. The problem comes during the export, the numbers doesn’t match.

Procedure:
Highlight taxon level we wish to export
File -> Export -> csv format -> taxonPath_to_count (also tried taxonName_to_count) -> assigned -> tab -> filename

I am not sure why this is so and it seems to happen for both MEGAN5 (v5.11.3) and MEGAN6 (v6.4) versions.

Kindly advise.

Thanks for your help!

Maria

If you request assigned counts, then MEGAN exports assigned counts, unless the node is collapsed, in which case the summarized count is exported. To avoid the latter, uncollapse any such node for which you don’t want this to kick in

Dear Daniel, I also have a question on this topic. in my original library the node that I am exporting there are sum=27355 and ass=12158, however when I open the extracted file (rma) it change to sum=12144 and ass=2571. if I export fast all the 27355 sequences are in the file? so what should I change to have all de 27355 sequences in the exported .rma file?

That doesn’t sound right
Can you give me more details?

Hi Daniel,

Is there any way to export everything from the inspector to a .csv file (see example below)?

Anelloviridae [1]
tig00000237 [matches=9]
DATA
Lesser panda anellovirus; score=56.0
Seal anellovirus 4; score=54.0
Seal anellovirus 4; score=51.0
Torque teno midi virus 12; score=51.0
Torque teno midi virus 7; score=47.0
Seal anellovirus 4; score=35.0
Lesser panda anellovirus; score=31.0
Torque teno midi virus 11; score=29.0
Simian torque teno virus 31; score=23.0

In the inspector window you can select some taxa and then use the File->Export Selected Taxa… menu item.

But probably the better way to export counts is to use the File->Export → Text (CSV) Format… menu item.
This allows you to select nodes in the taxonomy (or a functional) viewer and then to export counts, reads, etc in several different ways.

ok thank you Daniel!

Also, after I set my LCA parameters I get some viruses bolded in black lettering and some in light grey lettering, what does this mean?

Hi Daniel,

I was able to change the text file to a .csv file and I was able to extract select viral family FASTA files by right clicking the node on the tree and clicking “extract.” I was wondering now if there is a way to select a node and extract all the NCBI accession numbers? I only see the accession numbers listed on the inspector when you expand a node to see alignments. Is there a easier way to extract accession numbers? Thanks! Katie

This is where I see the accession numbers from the different nodes in the inspector window.

>YP_009336956.1 hypothetical protein 2 [Hubei tombus-like virus 11]
Length = 465

Score = 69 bits (166), Expect = 7e-11
Identities = 49/129 (38%), Positives = 68/129 (53%), Gaps = 11/129 (9%)
Frame = -3

Query: 386 RANPRGTPSHQVVPHPLPTPGSFEVHVHNNCLCNEYLSLRNRVLQQVPEP-LDT-----FVDEMRNLAHRVSTWLGKHTPSDGEWIQQYSGRKATMYRNAAADLMLVPFSRRDRYIKSFL 45
R R TP Q P G ++V ++ +E +SLRNR+L +P+P L+T FV EMR+L H+V T + + I +Y+G K T Y AA LM P ++RD YI FL
Sbjct: 4 RDKTRATPWKQYCFKSFP–GWYKVDYPSSTYIDEEVSLRNRILLPMPQPQLNTPQWLSFVREMRHLKHQVPI—VETLTRQQVILKYTGAKRTRYEKAAISLMTKPLNKRDSYIDCFL 118

Query: 44 KPEKISDPT 18
K EK+ T
Sbjct: 119 KVEKMPHET 127

I have an update. Looks like I can’t export all the blastx alignments with e-value, % identity etc. from the inspector. However, I was able to export this information in .csv from highlighting all my viral families on the taxonomic tree. When you export from the inspector and change the .txt file to a .csv file all the data is still organized in one column making it hard to clean up for data analysis in R, so exporting from the taxonomic tree is the way to go. I am still in the process of learning how to visualize this same data using Diamond. I need to figure out the fastest way to create a table in excel with my virus family, genus & specie alignment, %identity, e-value, and protein aligned to my contigs, so I can create figures in R. Thanks!

@daniel What is taxon id?

# read-name taxon-id match-length bit-score percent-identity
tig00000237 2012640 0 56 37.8
tig00000237 1566011 0 54 32.7
tig00000237 1566011 0 51 32.6
tig00000237 2065053 0 51 32.2
tig00000237 2065048 0 47 31.7
tig00000237 1566011 0 35 50
tig00000237 2012640 0 31 22.2
tig00000237 2065052 0 29 27.1
Is it possible to get all the information in this file with accession numbers, and e-values?