Using DIAMOND/MEGAN for assembled pure culture bacterial genomes

ranitori · November 26, 2020, 9:11am

Hi, I have sequenced a pure culture derived bacterial genome with the MinION, and assembled it with Unicycler, which generally results in 1 - 6 contigs. I would like to know whether it would be appropriate to apply the approach taken in the Arumugam et al (2019) paper Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data to this data? Despite the fact that my data is not from a metagenome. Thanks.

Daniel · November 26, 2020, 9:30am

Yes, that definitely work. You can compare against NR using DIAMOND in long read mode, then analyze in MEGAN in long read mode. This will allow you to explore the taxonomic assignment of the contigs. They should all fall into a set of nodes that are on the path from the root to a specify species node (unless there is contamination…) The long read inspector will allow you to see which genes are found where.
Finally, you can then export all contigs in a frame-shift corrected fashion. This will allow you to apply tools that used translated alignment, such as Prokka.
Please let me know how this works for you!

ranitori · November 26, 2020, 6:54pm

Thanks so much for your prompt reply! Much appreciated.

ranitori · December 19, 2020, 10:12am

Hello again.

Regarding the NCBI-nr database required for running DIAMOND, should I be downloading all of the 38 nr*.gz files from https://ftp.ncbi.nlm.nih.gov/blast/db/

OR

do I need the single nr.gz from https://ftp.ncbi.nih.gov/blast/db/FASTA/

Thanks.

Daniel · December 19, 2020, 12:47pm

Download and use the single nr.gz.

ranitori · December 19, 2020, 11:40pm

Thank you. After downloading the file but before using DIAMOND, should I format the nr database using blast+ (the makeblastdb command), or will DIAMOND do the formatting?

Daniel · January 5, 2021, 8:32am

DIAMOND does its own formatting