Hi Daniel,
I recently encountered an error using sam2rma
. I used minimap2 to align PacBio HiFi reads to NCBI nt to produce a SAM file. The SAM alignments and HiFi reads fasta are sorted such that the read occur in the same order in both files. Below I show the logging from sam2rma
:
SAM2RMA6 - Computes a MEGAN RMA (.rma) file from a SAM (.sam) file that was created by DIAMOND or MALT
Options:
Input
--in: 4-merged/Zymo6331-STD.merged.sam
--reads: 5-fasta-sort/Zymo6331-STD.sorted.fasta
Output
--out: 6-rma/Zymo6331-STD.nucleotide.readCount.rma
--useCompression: true
Reads
--paired: false
--pairedSuffixLength: 0
Parameters
--longReads: true
--maxMatchesPerRead: 100
--classify: true
--minScore: 50.0
--maxExpected: 0.01
--topPercent: 10.0
--minSupportPercent: 0.05
--minSupport: 0
--minPercentReadCover: 0.0
--minPercentReferenceCover: 0.0
--lcaAlgorithm: longReads
--lcaCoveragePercent: 100.0
--readAssignmentMode: readCount
Classification support:
--mapDB: /home/dportik/programs/megan/db/megan-nucl-Jan201.db
Deprecated classification support:
--parseTaxonNames: true
--firstWordIsAccession: true
--accessionTags: gb| ref|
Other:
--threads: 24
--verbose: true
Version MEGAN Community Edition (version 6.19.4, built 16 Jul 2020)
Author(s) Daniel H. Huson
Copyright (C) 2020 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Loading ncbi.map: 2,259,889
Loading ncbi.tre: 2,259,893
Current SAM file: 4-merged/Zymo6331-STD.merged.sam
Reads file: 5-fasta-sort/Zymo6331-STD.sorted.fasta
Output file: 6-rma/Zymo6331-STD.nucleotide.readCount.rma
Classifications: Taxonomy
Generating RMA6 file Parsing matches
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file Zymo6331-STD.merged.sam
Parsing file: 4-merged/Zymo6331-STD.merged.sam
Input domination filter: MinPercentCoverToStronglyDominate=90.0 and TopPercentScoreToStronglyDominate=90.0
10% 20% 30% 40% Error parsing file near line: 99740006: For input string: "-inf"
Error parsing file near line: 99740006: For input string: "-inf"
WARNING: Failed to find read 'm64015_200911_223407/141428915/ccs' in file: 5-fasta-sort/Zymo6331-STD.sorted.fasta
Error parsing file near line: 99740006: For input string: "-inf"
WARNING: Failed to find read 'm64015_200911_223407/141428915/ccs' in file: 5-fasta-sort/Zymo6331-STD.sorted.fasta
Error parsing file near line: 99740006: For input string: "-inf"
WARNING: Failed to find read 'm64015_200911_223407/141428915/ccs' in file: 5-fasta-sort/Zymo6331-STD.sorted.fasta
Error parsing file near line: 99740006: For input string: "-inf"
WARNING: Failed to find read 'm64015_200911_223407/141428915/ccs' in file: 5-fasta-sort/Zymo6331-STD.sorted.fasta
Error parsing file near line: 99740006: For input string: "-inf"
WARNING: Failed to find read 'm64015_200911_223407/141428915/ccs' in file: 5-fasta-sort/Zymo6331-STD.sorted.fasta
I found the offending line in the SAM file:
m64015_200911_223407/141428915/ccs 2048 CP025592.1 1624340 1 19521H41M288D129M1I103M115H * 0 0 TTAAAATTTGGTGCTCACCGATAACCTGGTTCTTCTGTAAGAGAAACATCCCCACTAGTACTTTTCCGAGGAAGCGTCGGTTAACTTCGGATAGTTCGTTAACCCGCGGAAAGGTCGCCATGGCGATAATGTAGGCCGTTGCCAGCGGAAAGATATTCCGGGTGATTGGGAAAAACGGCGTTTAAGAGGTCGCTGAAGTAAAACTGGTTCTTCCAAACTAGTTGAAGGAGGACTGAACCGGCCCCATAGGCCACCAGGACTAACCAGGTGGTGA * NM:i:289 ms:i:455 AS:i:193 nn:i:275 tp:A:P cm:i:6 s1:i:45 s2:i:47 de:f:-inf SA:Z:CP025592.1,1618018,+,11184S5474M11I3241S,60,68;CP025592.1,217695,-,383S2011M3I17513S,11,38;CP025592.1,1605099,+,7293S1941M2I10674S,60,150;CP025592.1,1624034,+,17207S306M2397S,60,1;CP025592.1,1600292,+,2640S90M17180S,60,2; rl:i:0
You will notice that the de:f:-inf
tag is present, and is what is causing the error in sam2rma
. I searched minimap2 issues and found this thread which discusses the tag. It seems that "-inf" is a valid floating point number
and so this tag may appear in SAM files.
Is it possible to update sam2rma
to handle this tag? It seems to be quite rare in SAM files, but I have seen it in two different datasets causing the same problem. The alternate solution would be to filter out alignments with this tag. Please let me know if it would be possible to update sam2rma
.
Thanks,
Dan