This is the message from MEGAN:
“Executing: show window=ExtractReads;
Executing: extract what=reads outdir=‘/home/tanshiming/Downloads’ outfile=‘2.fasta’ data=Taxonomy ids= 1246637 allBelow=false;
Info: Number of reads written: 570”
I will like to find out if it is possible to extract the entire fasta sequence along with the header.
That will not make a difference in terms of extracting reads.
If want to use the long reads option if the input sequences are likely to cover more than one gene, as this requires different algorithms.
Could you please do the following:
open one of the taxonomy nodes in the inspector window.
Then, in the inspector window, open on a taxon, then a read and then the Data item for the read. Do you see the read/contig header and sequence in the Data item?
You should use the long read option if the input sequences may contain/cover multiple genes, i.e. for long read metagenomic sequencing reads or for assembled contigs.
I guess if the fasta header really represent the taxa-of-interest, I can use some unix script to pull out the fasta sequences. Just curious why MEGAN software was not able to do it.
You can supply contigs. The length is no problem. Whether you select the long reads option or not depends on whether you expect the contigs to contain multiple genes. If yes, then the long read option is more suitable, as it takes multiple genes into account.