Problems extracting contigs/reads

shimingt · August 6, 2019, 9:05am

Dear MEGAN community,

I have difficulties extracting reads/contigs from selected leaves. I only get the headers of the fasta sequences in the fasta file.

For example:

S620100019205:~$ head ‘/home/tanshiming/Downloads/2.fasta’

“c_000000511798”
“c_000000614439”
“c_000001110611”
“c_000001301338”
“c_000001553089”
“c_000001616078”
“c_000001778764”
“c_000002196326”
“c_000002295592”
“c_000002663581”

This is the message from MEGAN:
“Executing: show window=ExtractReads;
Executing: extract what=reads outdir=‘/home/tanshiming/Downloads’ outfile=‘2.fasta’ data=Taxonomy ids= 1246637 allBelow=false;
Info: Number of reads written: 570”

I will like to find out if it is possible to extract the entire fasta sequence along with the header.

Thank You

Daniel · August 6, 2019, 2:37pm

did you supply the reads during parsing of the original data?
If not, then no reads will be exported

shimingt · August 7, 2019, 2:31am

Dear @Daniel,

Thanks for the reply. I did, if I were to supply contigs, do I have to use the option " Long Reads"?

My contigs are more than 500 bp as stated in the manual, will that be a problem?

Thank you

Daniel · August 7, 2019, 12:23pm

That will not make a difference in terms of extracting reads.
If want to use the long reads option if the input sequences are likely to cover more than one gene, as this requires different algorithms.

shimingt · August 8, 2019, 1:35am

Dear @Daniel,

I am still having issues extracting the contigs even after I have parsed them while importing the BLAST file. Will you be able to advice?

In addition, under what kind of situation should I use the “long read” option?

Thank you for your time.

Daniel · August 9, 2019, 9:15am

Could you please do the following:
open one of the taxonomy nodes in the inspector window.
Then, in the inspector window, open on a taxon, then a read and then the Data item for the read. Do you see the read/contig header and sequence in the Data item?

You should use the long read option if the input sequences may contain/cover multiple genes, i.e. for long read metagenomic sequencing reads or for assembled contigs.

shimingt · August 16, 2019, 5:35am

Dear @Daniel,

I can see the contig head, but not the sequence.

How do I troubleshoot this issue?

Thanks!

shimingt · August 21, 2019, 7:11am

Dear Daniel,

I guess if the fasta header really represent the taxa-of-interest, I can use some unix script to pull out the fasta sequences. Just curious why MEGAN software was not able to do it.

Thanks.

Daniel · September 9, 2019, 12:24pm

You can supply contigs. The length is no problem. Whether you select the long reads option or not depends on whether you expect the contigs to contain multiple genes. If yes, then the long read option is more suitable, as it takes multiple genes into account.