Megan paired reads assignation

Cedric · March 31, 2018, 8:33am

Hello
I would like to have some clarification about the taxonomic assignment of paired reads.
In the manual it is specified that:
“Consider a pair of reads r and s. Let A and B be the taxa assigned to r and s by the LCA algorithm, respectively. If taxon A is undeﬁned because r has no alignments, then we set A = B, and vice versa.”
Does this mean that although read r has no assignment it will still be placed on the B taxon and therefore counted. I guess this procedure is not applied if the read r is “no hit”.

Another question regarding the specification of read pair suffixes: I have two blastn files one for reads 1 and one for reads 2 that are imported in the same project. My headers are as follows:
for read1
@M03493:133:000000000-BMGCL:1:1102:12746:24235 1:N:0:1
for read2
@M03493:133:000000000-BMGCL:1:1102:12746:24235 2:N:0:1
if I specify that the suffixes are respectively 1:N:0:1 and 2:N:0:1
Is this correct?

Thanks you
Best regards
Cédric

Daniel · April 1, 2018, 5:28am

Dear Cedric,

the manual is not completely clear here: if read r has no taxonomic assignment, but mate s does, then r gets assigned to the same taxon as s, even r has no hits.

In your example;

@M03493:133:000000000-BMGCL:1:1102:12746:24235 1:N:0:1
@M03493:133:000000000-BMGCL:1:1102:12746:24235 2:N:0:1

there is no common suffix and you should thus specify an empty suffix.
This is because MEGAN only uses the first word in the head string as “read name|”, which in this case is

@M03493:133:000000000-BMGCL:1:1102:12746:24235

This will work ok as long as the alignments for the pairs of reads do not appear consecutively, so, don’t first list all alignments for read r and then directly follow these by all alignments for mate s, because then MEGAN will not know which reads belong to r and which will belong to s.

(For the same reason, all alignments associated with a fixed read f must always appears contiguously in a file because MEGAN does not remember the set of read names seen so far and will thus assume that there are multiple reads that have the same name (not well expressed, but I hope this is understandable)

Cedric · April 1, 2018, 7:40am

Hi Daniel
Thank you once again for your quick response. It seems clear to me now.
Just to make sure my procedure is correct:
I have paired reads, R1 and R2 in two separate fastq files. I generated the two corresponding blastn files with MALT.
I simultaneously import into MEGAN these two blastn files generated by MALT specifying “paired”.

It seems to me that in this case MEGAN correctly manages alignments (the pairs of reads do not appear consecutively because files are imported one after the other), can you confirm that ?

Thanks again
Best regards
Cédric

Daniel · April 1, 2018, 11:08am

Yes, by two separate files for the R1 and R2 reads, MEGAN will be able to distinguish between the alignments for the first and the second read of a pair.

Cedric · April 1, 2018, 1:14pm

great !
Thanks Daniel