Differences in Bacterial Read Counts

Hii Daniel,

  1. I have done alignment using diamond with NR-DB. This step generated x.daa files , after that i have done megan using prot_acc2tax-May2017.bin. I have used paired end read mode for taxonomy analysis. This step generated y.rma file for me(Output–1 file). I have checked the read counts with respect to Bacteria and other kingdoms(i am interested in Bacterial counts).
  2. Because MEGAN do not support paired end mode for functional analysis, I have used x.daa files and meganized it with EggNOG mapping file (separately Read1 and Read2). This step generated data with respect to functions and i have exported it on CSV file. Additionally i have made a list of bacterial read counts for all x.daa files (Output—2 files).
    3.Next i have extracted the bacterial reads from these x.daa files which has been eggnog mapped(Output–2 files).

Now i have compared the read counts with respect to bacteria for all above mentioned steps and each step give different read counts. How can this be possible. For your reference please have a look on below mentioned data:
|x.RMA|Sample1|
|Bacteria|49010684|
|y.DAA||
|R1|25861304|
|R2|23012618|
|Sum_R1+R2|48873922|
|Only_Bacteria_reads||
|R1|25856988|
|R2|23008706|
|Sum_R1+R2|48865694|

Apart form this i am facing one another problem. The y.rma files provided read counts for archaea and viruses for all samples but x.daa files do not provide archaea and viruses read counts for all samples although the R1 and R2 for sample1 is same. Why it is happened.

Please help me out as soon as possible.

If I understand you correctly, you are asking whether it is possible that you get different numbers of assignments to bacteria if you process your reads (a) in paired end analysis vs (b) all reads separately? In this case, you expect to see different numbers because in paired mode, a read A that does not have any alignments will get assigned to the same taxon as its paired-read B, if B has has a taxon assignment.
Look for Tax. ass. by mate: xxx in the log output to see how many reads were assigned in this way.

With respect to your second question, please note that if you provide significantly different numbers of reads to MEGAN then this will influence what taxa you will see, due to the thresholds associated with the LCA algorithm. Please review the parameters used and set accordingly. For example, if you only provide half as many reads, then thresholds like min support should be lowered by a half, if you still want to see the same taxa appearing.

Thank you so much for your quick reply.
For first question you got me correct and as per your suggestion I have checked the “Tax. ass. by mate” in the log output , it gives me 107,918 reads. My understanding for this point is when we perform MEGAN using paired end mode it will provide us more assignment compare to separate files.

For second question my understanding is that if for paired end mode i have set 50 minimum support then for separate files i need to set it as 25. Is that the meaning of your point.