SAM format for sam2rma

I am trying to do taxonomic binning of metagenomic contigs my using the minimap2 (https://github.com/lh3/minimap2)

minimap2 outputs either PAF (https://github.com/lh3/miniasm/blob/master/PAF.md) or SAM formated files.

While trying to use sam2rma I get a:

java.io.IOException: File not in SAM format: test.sam

Unfortunately haven’t been able what part of my sam file is wrong. How does sam2rma read sam files?

please give me access to an example file

example.sam (2.8 MB)

MEGAN expects that the first line of a SAM file starts with
@HD
This is missing from your files (see https://samtools.github.io/hts-specs/SAMv1.pdf for details).
If you add that line, then the file will parse without problems.
However, there is no md: statement in your alignments, without which MEGAN cannot reconstructor the reference sequence in the alignments. So, when you look at the alignments in the inspector viewer, then please do not be surprised to see ??? instead.

However, because the @HD line is optional, I will also add support for files starting with @PG
The newest version 6.10.5 now does that…

I see, works almost perfectly now.

When using -ram readLength, magnitudes are still in read count. Is the read length parsed from header or computed?

Also, minimap2 won’t support MD tags anymore. Shifting to CS tags (https://github.com/lh3/minimap2#cs)

Hi Julian,

thanks for pointing out the move to CS tags, they appear to make more sense then CIGAR+MD and I will add support for these in MEGAN soon
D

Hi daniel
I need to quantify the number of reads assigned to each virus. For this purpose, i have mapped the reads to the reference (scaffolds) using Tophat2 or Bowtie2. I have produced the BAM file. Finally, I have converted the BAM file to SAM file.

With the SAM file, i am not able able to quantity the number of reads assigned.

Could please guide me on the steps to analyze the SAM files using Megan.

Your question is unclear to me: did you align your reads to a reference database for viruses? If so, you need to import the sam file into MEGAN and use a mapping file that maps the virus references to NCBI taxon ids.

Or did you align reads to assemblies of virus data? If so, you need to annotate each
assembled contig with the number of reads that align to that contig.
For example, if your contig header line is this:

>contig-666

and you aligned 50 reads to the contig,
then change the header line to this:

>contig-666|weight|50

Then import the contigs using the option “read assignment mode” set to “read magnitude”