Hello,
I am wondering if it would be possible to add a feature that would allow taxonomic annotations/counts to be exported in the Kraken report (‘kreport’) format, or MetaPhlAn3 mpa format? The full description of the Kraken format can be found here, and the mpa format is described here (abundance output).
The main advantage of the kraken format is that it provides the cumulative read counts at each taxonomic rank. For example, if there are 6 species/strains of Bacteroides, the genus Bacteroides would be the sum of these individual species counts. All of the genera in a family would contribute to a sum count for that family, etc. However, you can also see the number of reads assigned directly to these ranks as well. These makes it very easy to look at different ranks quickly. The same is more or less true with the mpa format, though it only calculates abundances and does not provide read counts.
I am attaching an example of the kraken and mpa formats here, if that will help in deciding the feasibility of this feature: STD-h500-k20.kraken.report.txt (19.0 KB); STD-profiled_metagenome.txt (29.9 KB)
This should work well for read counts, but I think could be extended to base pairs as well. It would be ideal to have either of these format options available from the command line. Currently, I have been using rma2info
to get read counts: rma2info -i {input} -o {output} -r2c Taxonomy -n --bacteriaOnly
. If this could include the kraken report and/or mpa format, that would be exceptionally useful!
Thanks,
Dan