Problems extracting reads with space in taxonomy name using read extractor tool


I am trying to use the read extractor tool in a customized script for extracting a specific taxonomy rank.

I found that if it does have an space in its name this Taxonomy name is separated in “words”. For example, if I want to extract the reads classified as “Weivirus-like virus sp.”:

/home/human/megan/tools/read-extractor -v -i Pool1-bats_merged_ok_v2.rma -c Taxonomy -n Weivirus-like virus sp. -b -o ./families/Pool1-families/Pool1-contigs-Weivirus-like-sp.fasta
ReadExtractorTool - Extracts reads from a DAA or RMA file by classification
Input and Output
–input: Pool1-bats_merged_ok_v2.rma
–output: ./families/Pool1-families/Pool1-contigs-Weivirus-like-sp.fasta
–frameShiftCorrect: false
–classification: Taxonomy
–classNames: Weivirus-like virus sp.
–allBelow: true
–all: false
–ignoreExceptions: false
–gzipOutputFiles: true
–propertiesFile: /home/human/.MEGAN.def
–verbose: true
Version MEGAN Community Edition (version 6.25.3, built 15 Sep 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.2; max memory: 7.8G
Loading 2,396,736
Loading ncbi.tre: 2,396,740
Warning: unknown class: ‘Weivirus-like’
Warning: unknown class: ‘virus’
Warning: unknown class: ‘sp.’
Processing file: Pool1-bats_merged_ok_v2.rma
Extracting by Taxonomy
Writing to: ./families/Pool1-families/Pool1-contigs-Weivirus-like-sp.fasta
100% (0.0s)
Reads extracted: 0
Total time: 4.1s
Peak memory: 0 of 7.8G

Same output if I try putting " " or ’ ’ in the Taxonomy name option (p.e.: /home/human/megan/tools/read-extractor -v -i Pool1-bats_merged_ok_v2.rma -c Taxonomy -n “Weivirus-like virus sp.” -b -o ./families/Pool1-families/Pool1-contigs-Weivirus-like-sp.fasta)

Is there a way to fix this? I need it to read the Taxonomy name without separating it with spaces

Thanks in advance!

The following works for me:

/Applications/MEGAN6CE/tools/read-extractor -b -i /Users/huson/data/asari/1mio/Alice01-1mio.daa -o stdout -c Taxonomy -n "Roseburia intestinalis" "uncultured Clostridium sp." -v

Put quotes around each name that you want to extract for. Unfortunately, the current version of the software doesn’t show you whether names have been quoted correctly (when using the -v option), e.g. here you don’t see that the program correctly received two names, “Roseburia intestinalis” and “uncultured Clostridium sp.”:

--classNames: Roseburia intestinalis uncultured Clostridium sp.

I have fixed this in the next release, which will then use single quotes to show exactly what names were received:

--classNames: 'Roseburia intestinalis' 'uncultured Clostridium sp.'

So, depending on which shell you are using and whether you are reading commands from a file etc, you will have to make sure that the double-quotes around the names are still present when the command-line options reach the program.