Importing metagenome pipeline output issues

Erik · September 27, 2019, 12:42pm

I’m currently doing an internship in bioinformatics(Metagenome analysis) and I am trying to make MEGAN6 run on the files which our Metagenome sequencing alignment pipeline https://github.com/MHH-RCUG/Wochenende puts out. It is a BAM file, which I converted to SAM.
I have downloaded the nucl_acc2tax and nucl_gi2tax files and created a synonym mapping file according to the instructions in the manual, to map the reference sequences to the corresponding NCBI TaxID.

For testing purposes, I created an artificial metagenome, containing 4 microorganisms and the human genome, with a read length of 75.
When I try to import the SAM File (Import from BLAST) I get no proper results and MEGAN gives me the following messages:

"WARNING: Failed to find read ‘…’ in file ‘…’ "
MEGAN can not find the reads from the pipeline output in the pipeline input .fastq containing the reads. I have checked and they are definetly present in the file. What can be the issue here?

Secondly, whether I have added the (optional) READS fastq, I am getting the following Error message:

“Initializing binning…
Using ‘Naive LCA’ algorithm (80.0 %) for binning: Taxonomy
Binning reads…
StringIndexOutOfBoundsException: String index out of range: 75”

What can be the reason for this exception? Is the read length the problem here? I have searched the forum for a similar problem, but I did not get that lucky.

Thanks in advance!

Daniel · September 30, 2019, 9:18am

Dear Eric,

"WARNING: Failed to find read ‘…’ in file ‘…’ "

There are two possible reasons for this:

the reads in the sam file appear in a different order than they do in the fastq file. MEGAN streams through both files simultaneously and thus requires that reads appear in the same order.
the first word of the read header line must be exactly the same as the first word of the query reference in the SAM file.

My guess is that (1) is causing the problems.

“Initializing binning…
Using ‘Naive LCA’ algorithm (80.0 %) for binning: Taxonomy
Binning reads…
StringIndexOutOfBoundsException: String index out of range: 75”

Unfortunately, looks like a bug. If you could send me a small file that exhibits the problem then I will fix it.

Erik · September 30, 2019, 9:55am

Dear Mr Huson,

thanks for the quick reply! I have sent the file to megan@inf.uni-tuebingen.de.
I will get back to you about the first error message, once I find out, which of the two scenarios have been the issue.