Hi Daniel,
I have been creating RMA files successfully with sam2rma
, and it is handling very large datasets quite well. Thanks for providing this alongside MEGAN.
I think there may be a misspecification of the default setting for --readAssignmentMode
:
-ram, --readAssignmentMode [string] Set the read assignment mode. Default value: readCount Legal values: readCount, readLength, alignedBases, readMagnitude
After making RMA files using the default for this flag, I noticed that total aligned bases are showing up in the plots, rather than read counts. Checking the log files, I see:
SAM2RMA6 - Computes a MEGAN RMA (.rma) file from a SAM (.sam) file that was created by DIAMOND or MALT
Options:
Input
–in: 1103_V2_D1.merged.sam
–reads: 1103_V2_D1.sorted.fasta
Output
–out: 1103_V2_D1.rma
–useCompression: true
Reads
–paired: false
–pairedSuffixLength: 0
Parameters
–longReads: true
–maxMatchesPerRead: 100
–classify: true
–minScore: 50.0
–maxExpected: 0.01
–topPercent: 10.0
–minSupportPercent: 0.05
–minSupport: 0
–minPercentReadCover: 0.0
–minPercentReferenceCover: 0.0
–lcaAlgorithm: longReads
–lcaCoveragePercent: 100.0
–readAssignmentMode: alignedBases
Classification support:
–mapDB: megan-nucl-map-Jul2020.db
Deprecated classification support:
–parseTaxonNames: true
–firstWordIsAccession: true
–accessionTags: gb| ref|
Other:
–threads: 32
–verbose: true
Version MEGAN Community Edition (version 6.19.4, built 16 Jul 2020)
The general command I’ve been using is:
sam2rma -i SAM -r READS -o OUTPUT -lg -alg longReads -t 32 -mdb DATABASE -v 2> LOG
The log file seems to indicate that the default is actually alignedBases
. I just wanted to check to see if this is the expected behavior, or if something about my SAM files caused a switch in the default?
I can certainly re-make the RMA files using -ram readCount
. However, it can take several hours to create each RMA file, and this may save some time moving forward.
Thanks,
Dan