Using read-extractor in command line to pull out unassigned reads from .daa file?

ninamaryn · January 2, 2022, 9:29pm

Hello there,

I want to analyze reads from two taxonomic categories that have been assigned using DIAMOND with the nr.db and meganizing the resulting .daa file: Streptophyta and ‘Not assigned’.

I have been able to extract these using Megan6 on my computer using the GUI, but I want to automate this for many many files on our computational cluster to reduce contaminant sequences without losing potentially interesting data from the unassigned taxon. I have easily extracted Streptophyta reads using the following command:

$MEGAN -i $diam/Sample.daa -o $fasta/sample.fasta -c Taxonomy -n ‘Streptophyta’ -b true -v true

but the following does not work:
$MEGAN -i $diam/Sample.daa -o $fasta/sample.fasta -c Taxonomy -n ‘Not assigned’ -b true -v true

Is there a specific code for accessing unassigned reads, or even choosing to not pull reads from other taxa?

Thanks!

blaize · January 3, 2022, 6:34pm

Hi,

You should directly utilize “–un unassigned.fa” option of Diamond.

Bests: B