Error in sam2rma - Failed to find read

Hello,
I’ve just tested converting a SAM output from minimap2. I used sam2rma with the following command:

sam2rma -i chunk0-1k.sam -r m64015_190924_232542.Q20.fasta_chunk_0000000-1k -lg -alg longReads -t 32 -mdb megan-nucl-map-May2020.db

These are PacBio HiFi reads that have been aligned to the NCBI nt database.

Here is the output on screen:

SAM2RMA6 - Computes a MEGAN RMA (.rma) file from a SAM (.sam) file that was created by DIAMOND or MALT
Options:
Input
	--in: chunk0-1k.sam
	--reads: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
Output
	--out: chunk0-1k-TEST.rma
	--useCompression: true
Reads
	--paired: false
	--pairedSuffixLength: 0
Parameters
	--longReads: true
	--maxMatchesPerRead: 100
	--classify: true
	--minScore: 50.0
	--maxExpected: 0.01
	--topPercent: 10.0
	--minSupportPercent: 0.05
	--minSupport: 0
	--minPercentReadCover: 0.0
	--minPercentReferenceCover: 0.0
	--lcaAlgorithm: longReads
	--lcaCoveragePercent: 100.0
	--readAssignmentMode: alignedBases
Classification support:
	--mapDB: /home/dportik/programs/megan/db/megan-nucl-map-May2020.db
Deprecated classification support:
	--parseTaxonNames: true
	--firstWordIsAccession: true
	--accessionTags: gb| ref|
Other:
	--threads: 32
	--verbose: true
Version   MEGAN Community Edition (version 6.19.4, built 16 Jul 2020)
Author(s) Daniel H. Huson
Copyright (C) 2020 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Loading ncbi.map: 2,259,889
Loading ncbi.tre: 2,259,893
Current SAM file: chunk0-1k.sam
Reads file:   m64015_190924_232542.Q20.fasta_chunk_0000000-1k
Output file:  chunk0-1k-TEST.rma
Classifications: Taxonomy
Generating RMA6 file Parsing matches
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file chunk0-1k.sam
Parsing file: chunk0-1k.sam
Input domination filter: MinPercentCoverToStronglyDominate=90.0 and TopPercentScoreToStronglyDominate=90.0
WARNING: Failed to find read 'm64015_190924_232542/23/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/28/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/29/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/31/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/32/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/36/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/37/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/38/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/39/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/40/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/41/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/42/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/46/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/48/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/49/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/54/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/55/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/58/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/59/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/60/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/61/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/66/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/67/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/68/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/71/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/75/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/76/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/77/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/83/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/93/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/98/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/102/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/103/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/104/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/105/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/106/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/119/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/122/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/123/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/131/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/132/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/134/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/136/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/144/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/147/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/149/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/150/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/160/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/163/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
WARNING: Failed to find read 'm64015_190924_232542/165/ccs' in file: m64015_190924_232542.Q20.fasta_chunk_0000000-1k
No further 'failed to find read' warnings...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (3.0s)
Total reads:               590
Alignments:                630
100% (0.0s)
Binning reads: Initializing...
Initializing binning...
Using 'Interval-Union-LCA' algorithm (100.0 %) for binning: Taxonomy
Binning reads...
Binning reads: Analyzing alignments
Total reads:              590
Total weight:       1,292,223
With hits:                 293 
Alignments:                630
Assig. Taxonomy:           286
MinSupport set to: 646
Binning reads: Applying min-support & disabled filter to Taxonomy...
Min-supp. changes:           9
Binning reads: Writing classification tables
Numb. Tax. classes:         33
Binning reads: Syncing
Class. Taxonomy:            33
100% (1.9s)
Total time:  12s
Peak memory: 3.1 of 97.7 G

I saw on other posts that this error can be thrown if the alignments are not in the same order as the reads. I’ve checked and the alignments appear in the same order as the reads, and they have the same labels. What seems odd is that all the initial alignments are ignored up to a certain point, then it seems to find the correct alignment-to-reads pairs again.

I include the full reads, truncated SAM, and output RMA here:

m64015_190924_232542.Q20.fasta_chunk_0000000-1k (59.2 KB)
chunk0-1k-truncated.sam (1.2 MB)
chunk0-1k-TEST.rma (1.7 MB)

Is this possibly a bug in sam2rma, or is there an issue with my input files?

Thanks,
Dan

Thank you for providing those files.
Here is the list of names present in your reads file:

grep “>” m64015_190924_232542.Q20.fasta_chunk_0000000-1k
>m64015_190924_232542/1/ccs
>m64015_190924_232542/4/ccs
>m64015_190924_232542/6/ccs
>m64015_190924_232542/12/ccs
>m64015_190924_232542/13/ccs
>m64015_190924_232542/17/ccs
>m64015_190924_232542/18/ccs

None of these are reported as “Failed to find read”. On the other hand, none of the reads that are reported as “Failed to find read” by MEGAN appear in the reads file. So, for these files, at least, the program appears to work as intended…

For running DIAMOND on a cluster, please visit this page:
http://www.diamondsearch.org/index.php?pages/distributed_computing/

Hi Daniel,
I understand. My other concern is that there are also many other alignments in the SAM file that are not in the reads file, but these are not reported as missing:

m64015_190924_232542/165/ccs
m64015_190924_232542/166/ccs
m64015_190924_232542/169/ccs
m64015_190924_232542/169/ccs
m64015_190924_232542/169/ccs
m64015_190924_232542/169/ccs
m64015_190924_232542/171/ccs
m64015_190924_232542/174/ccs
m64015_190924_232542/177/ccs
m64015_190924_232542/179/ccs
m64015_190924_232542/179/ccs
m64015_190924_232542/179/ccs
m64015_190924_232542/180/ccs
m64015_190924_232542/180/ccs
m64015_190924_232542/180/ccs
m64015_190924_232542/183/ccs
m64015_190924_232542/190/ccs
m64015_190924_232542/191/ccs
m64015_190924_232542/196/ccs
m64015_190924_232542/199/ccs
m64015_190924_232542/200/ccs
m64015_190924_232542/201/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/202/ccs
m64015_190924_232542/204/ccs
m64015_190924_232542/210/ccs
m64015_190924_232542/214/ccs
m64015_190924_232542/214/ccs
m64015_190924_232542/214/ccs
m64015_190924_232542/215/ccs
m64015_190924_232542/216/ccs
m64015_190924_232542/217/ccs
m64015_190924_232542/218/ccs
m64015_190924_232542/221/ccs
m64015_190924_232542/223/ccs
m64015_190924_232542/223/ccs
m64015_190924_232542/223/ccs
m64015_190924_232542/224/ccs
m64015_190924_232542/226/ccs
m64015_190924_232542/228/ccs
m64015_190924_232542/228/ccs
m64015_190924_232542/228/ccs
m64015_190924_232542/228/ccs
m64015_190924_232542/228/ccs
m64015_190924_232542/228/ccs
m64015_190924_232542/229/ccs
m64015_190924_232542/231/ccs
m64015_190924_232542/232/ccs
m64015_190924_232542/245/ccs
m64015_190924_232542/245/ccs
m64015_190924_232542/247/ccs
m64015_190924_232542/255/ccs
m64015_190924_232542/256/ccs
m64015_190924_232542/259/ccs
m64015_190924_232542/262/ccs
m64015_190924_232542/267/ccs
m64015_190924_232542/267/ccs
m64015_190924_232542/267/ccs
m64015_190924_232542/268/ccs
m64015_190924_232542/275/ccs
m64015_190924_232542/276/ccs
m64015_190924_232542/282/ccs
m64015_190924_232542/284/ccs
m64015_190924_232542/286/ccs
m64015_190924_232542/290/ccs
m64015_190924_232542/291/ccs
m64015_190924_232542/295/ccs
m64015_190924_232542/307/ccs
m64015_190924_232542/308/ccs
m64015_190924_232542/309/ccs
m64015_190924_232542/316/ccs
m64015_190924_232542/316/ccs
m64015_190924_232542/316/ccs
m64015_190924_232542/317/ccs
m64015_190924_232542/321/ccs
m64015_190924_232542/329/ccs
m64015_190924_232542/331/ccs
m64015_190924_232542/331/ccs
m64015_190924_232542/331/ccs
m64015_190924_232542/331/ccs
m64015_190924_232542/333/ccs
m64015_190924_232542/334/ccs
m64015_190924_232542/338/ccs
m64015_190924_232542/348/ccs
m64015_190924_232542/359/ccs
m64015_190924_232542/363/ccs
m64015_190924_232542/364/ccs
m64015_190924_232542/367/ccs
m64015_190924_232542/367/ccs
m64015_190924_232542/367/ccs
m64015_190924_232542/370/ccs
m64015_190924_232542/370/ccs
m64015_190924_232542/370/ccs
m64015_190924_232542/371/ccs
m64015_190924_232542/373/ccs
m64015_190924_232542/385/ccs
m64015_190924_232542/385/ccs
m64015_190924_232542/385/ccs
m64015_190924_232542/388/ccs
m64015_190924_232542/390/ccs
m64015_190924_232542/392/ccs
m64015_190924_232542/396/ccs
m64015_190924_232542/397/ccs
m64015_190924_232542/398/ccs
m64015_190924_232542/405/ccs

Do the errors stop being reported at a certain point (here it is 50)? Otherwise it would seem that these alignments have been paired to the reads file, but we know they do not exist there (as you’ve shown). This would help clarify another issue I am trying to sort out.

Yes, MEGAN stops reporting errors after a fixed number