Obtaining readable taxonomic information in rma6 files for genomic data

TeriH · October 8, 2025, 7:59am

Hello,

Apologies if this has been answered previously!

I used to used MEGAN5 to generate a base rma file for my metagenomic data. I then used to feed this rma file into a new script that would create a subset rma for my taxon of interest (eg Viridiplantae). I’m trying to get this process working with MEGAN6 (community edition), however there appears to be no readable taxonomic information in the rma6 file when read in command line - it is there in MEGAN GUI but the command line returns “Viridiplantae not found”. I looked at using DIAMOND and daa-meganiser to do this, however its my understanding that this is tailored towards proteins (please correct me if im wrong!)?

I blast against the nt database, and used blast2rma to generate my original rma6 files. Is there a way to incorporate readable taxonomic information in these rma6 files, and create these viridiplantae.rma6 files on the command line when working with genomic data, without the need to open each rma individually in GUI and export manually?

Thank you!

Anupam · October 14, 2025, 11:00pm

Dear @TeriH,

Please use MEGAN7. We no longer support MEGAN5, and MEGAN6 will be phased out.

A good workflow is to use a fast aligner like LAST against a nucleotide database (e.g., nt) and then feed the tabular alignments to blast2rma. Make sure you provide the current MEGAN mapping DB (from the MEGAN download page; use the nucleotide mapping file) so taxonomy is written into the .rma6:

blast2rma \
  --in alignments.tab[.gz] \
  --format BlastTab \
  --mapDB /path/to/megan-map-nucl.db \
  --out sample.rma6 \
  --threads 16

If you want reads for a specific organism, specify the name or taxid and extract on the command line—no GUI needed—using the read extractor:

MEGAN/tools/read-extractor \
  --input sample.rma6 \
  --classification Taxonomy \
  --classNames "Viridiplantae" \
  --allBelow \
  --output reads_Viridiplantae.fastq.gz

(Example with taxid:)

/Applications/MEGAN/tools/read-extractor \
  -i sample.rma6 -c Taxonomy -n 33090 -b -o reads_33090.fastq.gz

This keeps everything headless: align → blast2rma (with mapping DB) → read-extractor by taxon name or ID.

Best regards,
Anupam