Read-extractor not extracting properly

Gk18 · June 17, 2023, 3:55pm

Hi,

I’ve been trying to use the Megan read-extractor tool to extract only Eukaryotic reads from my .daa file. I have been using the code:

$MEGANPATH/read-extractor -b -i …/inputfile.daa -o …/outputfile.fasta -c Taxonomy -n Eukaryota &> eukaryota_extraction.log.output &

The process appears to work, but when checking in the GUI there’s still all reads? I’ve also tried doing the read extraction on the GUI, but after running diamond blastx and meganizing again I have the same issue of non-eukaryotic reads being present. I have no idea why it is not working as I have had it work on another sample in the past, any advice would be greatly appreciated.

Thank you

Daniel · August 7, 2023, 1:51pm

I just tried this and it appears to work for me, using this command line:

/Applications/MEGAN6CE/tools/read-extractor -b -i Alice00-1mio.daa -o Alice00-%t.txt -c Taxonomy -n Eukaryota -v

While my input file has 1million reads, the output file only contains 69 reads. So, I am at a loss what is going on in your case.
To help figure out what is going on:

what messages did the program write to the console while it was running?
could you please try extracting reads for a taxon that has only a tiny count, some species, say. Do you still get all reads?
could you please try extracting reads for a taxon that is not present in the dataset. Do you still get all reads?