"read length" or "aligned bases"?

btaboada · November 21, 2017, 9:14pm

Hello,

I am wondering what is the difference between “read length” or “aligned bases” in the read assignment mode option? Since there is no mention of these options in the manual. I want to use these option in order to analyst both, reads and contigs.

Best wishes,

Ania · November 23, 2017, 12:39pm

Hi,

The ‘aligned bases’ and ‘read length’ is the mode of reporting the abundance, i.e., how many sequences were assigned to a node. It could influence your analysis only regarding the min_support parameter - where some of the sequences might be placed somewhere else as they didn’t fulfill the minimal abundance requirement. Other than that it does not govern where your sequences go.

It is mainly intended for long sequences, so either contigs or Nanopore/Pac Bio reads. It does not make much sense in the short read mode. But, if you use it for contigs bare in mind it’s not going to reflect the abundance, as it was collapsed during the assembly. You can annotate your contigs with magnitudes (add a statement magnitude|666, say) and have nodes scaled to reflect total magnitudes.

It belongs to the long read analysis - paper describing that currently under review with all of the details,

regards,
Ania

Daniel · November 26, 2017, 1:08pm

Here is the manuscript: MEGAN-LR: New algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs

btaboada · December 12, 2017, 6:56am

Thanks, I will read the paper.

jujo0010 · May 13, 2019, 8:48pm

Hello Ania,

I’m working with contigs from shotgun data from bacterial communities. I’ve used blastx with options -F 15 --range-culling --top 10 for long reads, and then daa-meganizer with –longReads option. For taxonomic classification, I’ve got 177 mill assignments, which I presume refer to aligned bases (rather than aligned reads). Can these aligned bases be interpreted as abundance of the different taxa? From your reply above, it’s not clear to me that these data can be used to report abundance.

Many thanks in advance.
Best regards,
Juanjo

Daniel · May 26, 2019, 3:49pm

Because contigs (and long reads as generated by a MinION, say) have varying lengths, the number of reads assigned to a taxon is not a good proxy for abundance.
The number of bases or aligned bases is better, but even this doesn’t address the fact that different organisms are represented by different amount of reference sequence.
So a better way to infer taxon abundances would be to focus on a number of widely well represented genes.
In MEGAN, you could do this by selecting such genes in one of the functional classifications and then extracting those reads to a new document (using the corresponding menu item) and then looking at the base counts in that smaller document.