MEGAN 6.7.0 comes with a first release of a new “long read” mode.
This addresses the analysis of long reads or contigs, where long means long enough to expect multiple genes, perhaps at 1000bp or more, say.
This entails three changes:
- When MEGAN parses the result of alignment, a new algorithm is used that keeps alignments along the read. In contrast, the traditional “short read” parser assumes that all alignments are competing for the same spot…
- During taxonomic binning, for each taxon the bit scores for alignments along the read are added (taking overlaps into account) and then the naive LCA is run on the top 10% of these values.
- During functional binning, multiple genes are identified along the read and so the number of functionally binned items can now be larger than the number of reads. (Traditional MEGAN analysis assumes that there is at most one gene per read…)
There is a new viewer for analyzing long read assignments that is available in MEGAN UE: