Taxonomy Classification of contigs and reads

btaboada · June 27, 2017, 9:13pm

Hello,
I am want to use Megan in order to analyze data generated using blastn from both, contigs (assembled with IDBA) and reads (75bp) that we were not able to ensemble. How can this be reflected in MEGAN? For example in the header id of contigs I know how many reads are used

headerId_readsUsedInContig

and for the reads I have

headerId_1.

I want to consider this information for example for compare samples and for rarefaction, shannon index, etc. But, even do I have contigs that have for example 100,000 reads, Megan takes it as one.

Best regards,

Blanca

Daniel · June 27, 2017, 10:32pm

Hi Blanca,

there are two ways that you can proceed:

if you want to count reads, then on the header line of a contig, use

>contig055 magnitude=666

say, to set the number the number of reads that map to contig055 to 666.
2) The latest version of MEGAN allows you to choose “read length” or “aligned bases” rather than “number of reads” as the number to be reported.

If your data contains both contigs and reads, then it would be best to use the “LongRead” algorithm for taxonomic and functional assignment, as this allows for the occurrence of multiple genes within any given contig.

However, we are currently doing a lot of work on the long read assignment algorithms, so that change from release to release, and so perhaps it is best not to use the long read mode until we have figured out the best way to do this.
D