Export the table from the Long read inspector

CLAY · October 5, 2018, 8:51pm

Dear MEGAN members:

I am new to this program, and I am running it to see the blastn results of my assembled contigs. Then, I found that with the Long read inspector, the table (including read, length, assignment, %cover, and #alignments) is the one I would like to export as a table. However, I tried to find how to export that, and it was in vain. I think I might missed something. Thus, could anyone help me to export this table?

Thank you for your kind help!

Best wishes,

CY

Daniel · October 14, 2018, 12:40pm

It is currently not possible to export this to a file, but as a work around, use “select all” to select the all rows and then “paste” into a text editor

Daniel · October 14, 2018, 1:03pm

I have just added a new File->Export->Selection…
feature that can be used to export whatever is selected in the long read inspector to a file, it will available with the next update later this week

Daniel · October 15, 2018, 12:01pm

Release 6_12_6 has this feature

CLAY · October 15, 2018, 1:09pm

Thank you Daniel!

I will try this version as soon as possible.

CY

CLAY · October 22, 2018, 6:00pm

Hi, Daniel:

It works with the new version. Thank you so much for the new feature!
May I suggest another possible feature for next version? On the table of the Long read inspector, maybe it will be worthy to put the identity% for the blast results correlated with the Assignment column (or the results with the highest coverage). It can be a useful feature for the users.

Thank you so much for developing this wonderful program!

Best Wishes,

CY

lucyintheskyzzz · January 11, 2023, 2:41am

Hi Daniel,

I managed to cut and paste my long read inspector data to notepad ++. I was wondering if you could explain what each number means? Matches is the number of alignments to the respective contig, but what does the bigger number mean?

Thanks! KV

read assignment mode = aligned bases

dolphinfeces.flye.medaka.consensus.blastx.rma6
Taxonomy
Autographiviridae [1,548] (what is this number??)
contig_101 [matches=7]
DATA
uncultured phage_MedDCM-OCT-S45-C18; score=127.0
Nonlabens phage P12024L; score=91.0
Nonlabens phage P12024S; score=91.0
Rhodobacter phage RcapMu; score=88.0
Synechococcus phage S-CAM4; score=86.0
Synechococcus phage ACG-2014j; score=86.0
Phage MedPE-SWcel-C56; score=69.0
contig_50 [matches=1]
DATA
uncultured phage_MedDCM-OCT-S45-C18; score=247.0
Myoviridae [1,266]
Podoviridae [582]
Salasmaviridae [2,190]
Siphoviridae [33,237]
Inoviridae [1,980]
Microviridae [14,990]
Parvoviridae [4,503]
Circoviridae [1,335]
unclassified RNA viruses ShiM-2016 [759]
Iridoviridae [897]

Daniel · January 11, 2023, 7:33am

The larger number in brackets is the “class size”. The interpretation depends on which “read assignment mode” is set. In your case, this is set to “aligned bases”. So, in your case, the number is the total number of aligned bases.

lucyintheskyzzz · January 12, 2023, 2:10am

r, Thank you. Is there a way to export total blast x hits (alignments) to different viral families?

lucyintheskyzzz · January 12, 2023, 4:34am

Hi Daniel,
Would one use the total number of aligned bases to see how much of a genome was recovered by X number of contigs?

I am wondering how I can use Megan to extract contigs for genome assembly? It is best to extract all the contigs that align to a particular virus family or genus, or species then download the reference genomes and run a genome alignment?

Daniel · January 12, 2023, 8:30am

You can also set the read assignment mode to “bases”, which counts the based assigned to a taxon, not just the aligned bases. We introduced the “number of aligned bases” for early versions of Nanopore sequencing, where reads sometimes appeared to contain large stretches of “garbage” bases… It seemed safer to only count bases that align to something…

I’m not sure I understand what you mean by extract contigs for assembly? Do you mean long reads rather than contigs? Or do you want to try to assemble the (already assembled) contigs? Either way, MEGAN allows you to select nodes in the taxonomy view and then save all assigned reads/contigs to a file. You can use %t and %i in the supplied file name to put things into files whose name contain the taxon name (place holder %t) and/or taxon id (place holder %i).

lucyintheskyzzz · January 18, 2023, 9:24pm

Hi Daniel,

I mean can I export only the contigs from the viral families of interest if I want to do an alignment with a reference genome using bowtie (or some other alignment tool). Like if I have alot of alignments to one particular virus, I would want to extract those contigs and align it to a reference genome.

lucyintheskyzzz · January 21, 2023, 9:24pm

Hi Daniel,

I figured out how to export the selected inspector viral families and I just changed the .txt to a .csv and that worked. Thanks!

Daniel · February 2, 2023, 1:42pm

If you want to save the contigs assigned to a particular node, select the node in the main taxonomy viewer (or one of the other classification viewers) and then use the following menu item:

File->Extract Reads…

This will place all reads (or contigs) assigned to one or more selected nodes to text files. You can use place-holders %f, %t and %i in the specify file names to have reads to different files. The place holders are replaced by input file name, class name and class id, respectively.

lucyintheskyzzz · February 8, 2023, 8:34pm

Thanks Daniel! I will try this out!

lucyintheskyzzz · October 22, 2023, 8:01pm

Hi Daniel,
Is there a way to export all a taxonomic information via command line version of MEGAN or Windows version that gives me a .tsv or .csv file similar to DIAMOND? Diamond produces a perfect .tsv table from all my blastx hits against the NCBI database, but did not provide order, subfamily, family, genus or species, so I ended up meganizing my blastx via Diamond into a .daa file and uploaded it to MEGAN, but I still can’t figure out how to generate the same table as Diamond with all the taxonomic information from the NCBI database. Can MEGAN do this? Below is an example

qseqid	sseqid	pident	length	mismatch	evalue	bitscore	staxids	sscinames	sskingdoms	skingdoms	sphylums	stitle
tig00000086	YP_001426684.1	100	40	0	4.50E-19	84.3	322019	Acanthocystis turfacea chlorella virus 1	Viruses	Bamfordvirae	Nucleocytoviricota	YP_001426684.1 ubiquitin family protein [Acanthocystis turfacea chlorella virus 1]

Daniel · October 24, 2023, 5:00pm

I think I understand what you want: for each read, a list of matches, and for each match, its taxon assignment?

One way to get this is to use "File->Export->Annotations in GFF3 format…

This produces this format:

contig_842	MEGAN	CDS	789	2795	1246	+	0	Id=WP_032529350.1; acc=WP_032529350.1; tax=Bacteroides_fragilis; taxRel=above; Name=Bacteroides fragilis;
contig_842	MEGAN	CDS	4111	4657	106	-	2	Id=MBC34808.1; acc=MBC34808.1;
contig_842	MEGAN	CDS	6879	7935	582	+	0	Id=WP_146333102.1; acc=WP_146333102.1; tax=Bacteroides_fragilis; taxRel=above; Name=Bacteroides fragilis;
contig_842	MEGAN	CDS	8161	8512	92	+	1	Id=MBE6274036.1; acc=MBE6274036.1;
contig_842	MEGAN	CDS	8533	10731	1343	+	1	Id=WP_022348127.1; acc=WP_022348127.1; tax=Bacteria; taxRel=above; Name=Bacteria;
contig_842	MEGAN	CDS	11628	12736	703	-	0	Id=MZM33204.1; acc=MZM33204.1;
contig_842	MEGAN	CDS	12747	13121	225	-	0	Id=WP_032529351.1; acc=WP_032529351.1; tax=Bacteria; taxRel=above; Name=Bacteria;
contig_842	MEGAN	CDS	13162	14783	1067	-	2	Id=EXY11524.1; acc=EXY11524.1; tax=Bacteroides_fragilis; taxRel=above; Name=Bacteroides fragilis;
contig_842	MEGAN	CDS	14803	18248	2252	-	2	Id=WP_008770235.1; acc=WP_008770235.1; tax=Bacteroides_fragilis; taxRel=above; Name=Bacteroides fragilis;
contig_842	MEGAN	CDS	18437	19211	493	-	1	Id=WP_195460329.1; acc=WP_195460329.1; tax=Bacteroides_fragilis; taxRel=above; Name=Bacteroides fragilis;

Does this suffice?

lucyintheskyzzz · October 24, 2023, 7:56pm

Hi @Daniel Is it possible to get this information (see my example below), but add family, subfamily, genus and specie? Below is my output from Diamond, but Diamond does not report family , subfamily, genus and specie yet. I think they are working on it. This is why I meganized my samples in Diamond to a .daa file, because I saw that MEGAN gave me the family, subfamily, genus and specie, but I can’t get it to generate a similar output like Diamond with a perfect .tsv file that I can open up into excel so I can us R Studio for statistics. I have way to many blastx hits to add all the families by hand. I have >10k hits! If you could help me figure this out I would be forever grateful!.

qseqid	sseqid	pident	length	mismatch	evalue	bitscore	staxids	sscinames	sskingdoms	skingdoms	sphylums	stitle
730ccc9f-3141-4ad7-9bed-01cb917a7fe1	YP_009188291.1	47.5	59	30	1.82E-11	60.8	1589733	Cyanophage P-TIM40	Viruses	Heunggongvirae	Uroviricota	YP_009188291.1 peroxyredoxin antioxidant [Cyanophage P-TIM40]

Daniel · November 8, 2023, 7:45am

You can export read name to taxon path KPCOFGS using

File->Export->Text (CSV) Format... and then choose:

readName_to_taxonPathKPCOFGS

to get output like this:

HISEQ:457:C5366ACXX:2:1103:20670:97785  d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;
HISEQ:457:C5366ACXX:2:1103:11052:100159 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;
HISEQ:457:C5366ACXX:2:1104:15661:12007  d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;
HISEQ:457:C5366ACXX:2:1104:2472:14051   d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;
HISEQ:457:C5366ACXX:2:1104:7589:17082   d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;
HISEQ:457:C5366ACXX:2:1101:17762:3139   d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli;
HISEQ:457:C5366ACXX:2:1101:7826:4284    d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli;
HISEQ:457:C5366ACXX:2:1101:10971:6114   d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli;
H