Sam2rma/blast2rma weighted LCA error:

Dear MEGAN community,

I’m running into an error when trying to use sam2rma/blast2rma to parse a .sam file. Applying weighted LCA results in an error:

[a1683110@acad1 MALT]$ ./0_Software/MEGAN6-8-8/tools/blast2rma -f SAM -i 16841_Medieval_MertonPriory_MPY86.blastn.sam.gz -r 16841_Medieval_MertonPriory_MPY86.fastq.gz -o . -ms 44 -me 0.01 -supp 0.1 -alg weighted -a2t nucl_acc2tax-May2017.abin -v
BLAST2RMA6 - Computes MEGAN RMA files from BLAST (or similar) files
Options:
Input
–in: 16841_Medieval_MertonPriory_MPY86.blastn.sam.gz
–format: SAM
–blastMode: Unknown
–reads: 16841_Medieval_MertonPriory_MPY86.fastq.gz
Output
–out: .
–useCompression: true
Reads
–paired: false
–pairedSuffixLength: 0
–pairedReadsInOneFile: false
Parameters
–longReads: false
–maxMatchesPerRead: 100
–classify: true
–minScore: 44.0
–maxExpected: 0.01
–topPercent: 10.0
–minSupportPercent: 0.1
–minSupport: 0
–minPercentReadCover: 0.0
–lcaAlgorithm: weighted
–weightedLCAPercent: 80.0
–readAssignmentMode: readCount
Functional classification:
Classification support:
–parseTaxonNames: true
–acc2taxa: nucl_acc2tax-May2017.abin
Other:
–firstWordIsAccession: true
–accessionTags: gb| ref|
–verbose: true
Version MEGAN Community Edition (version 6.8.18, built 21 Jul 2017)
Copyright © 2017 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Loading ncbi.map: 1,601,128
Loading ncbi.tre: 1,601,131
Opening file: nucl_acc2tax-May2017.abin
Processing SAM file: 16841_Medieval_MertonPriory_MPY86.blastn.sam.gz
Output file: ./16841_Medieval_MertonPriory_MPY86.blastn.rma6
Classifications: Taxonomy
Parsing file: 16841_Medieval_MertonPriory_MPY86.blastn.sam.gz
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (201.8s)
Total reads: 611,376
Alignments: 4,930,931
100% (0.0s)
Binning reads: Initialization
Binning reads…
Computing taxon-to-species map
Using ‘Weighted LCA’ assignment (80.0 %) on Taxonomy
Computing weights
Total matches: 1,122,553
Total references: 15,246
Total weights: 525,485

Binning reads: Analyzing alignments
Caught:
java.lang.NullPointerException
at megan.algorithms.AssignmentUsingWeightedLCA.computeId(AssignmentUsingWeightedLCA.java:121)
at megan.algorithms.DataProcessor.apply(DataProcessor.java:241)
at megan.core.Document.processReadHits(Document.java:520)
at megan.rma6.RMA6FromBlastCreator.parseFiles(RMA6FromBlastCreator.java:298)
at megan.tools.BLAST2RMA6.createRMA6FileFromBLAST(BLAST2RMA6.java:348)
at megan.tools.BLAST2RMA6.run(BLAST2RMA6.java:305
at megan.tools.BLAST2RMA6.main(BLAST2RMA6.java:62

Total reads: 1
With hits: 1
Alignments: 1
Assig. Taxonomy: 0
MinSupport set to: 1
Binning reads: Applying min-support & disabled filter to Taxonomy…
Min-supp. changes: 0
Binning reads: Writing classification tables
Numb. Tax. classes: 0
Binning reads: Syncing
Class. Taxonomy: 0
100% (107.5s)
Total time: 323s
Peak memory: 4.5 of 434.0G

The exact same error also occurs for sam2rma.
java.lang.NullPointerException
at megan.algorithms.AssignmentUsingWeightedLCA.computeId(AssignmentUsingWeightedLCA.java:121)
at megan.algorithms.DataProcessor.apply(DataProcessor.java:241)
at megan.core.Document.processReadHits(Document.java:520)
at megan.rma6.RMA6FromBlastCreator.parseFiles(RMA6FromBlastCreator.java:298)
at megan.tools.SAM2RMA6.createRMA6FileFromSAM(SAM2RMA6.java:317)
at megan.tools.SAM2RMA6.run(SAM2RMA6.java:274)
at megan.tools.SAM2RMA6.main(SAM2RMA6.java:63)

Running both of the tools with naive LCA works normally.

Regarding accession to taxonomy mapping files, how do I go about making one? I’m interested in using a MALT index containing bacterial genomes from the RefSeq database (Full, Chromosome, and Scaffold-level assemblies), as the 2017nt BLAST database only has RefSeq entries from Full-level assemblies.

Thank you for your time,
Kind regards,
Raphael

Re. the taxonomy mapping files: would I be able to circumvent the problem by making MALT output full text BLAST matches (i.e. --format Text)? As these contain taxonomy information which blast2rma can parse without an accessions mapping file?

I have identified the bug and I will upload a fixed release later this week.

If you save alignments in full text format then these will contain taxonomic names and MEGAN will be able to parse the names without needing a mapping file