Hi, I have sequenced a pure culture derived bacterial genome with the MinION, and assembled it with Unicycler, which generally results in 1 - 6 contigs. I would like to know whether it would be appropriate to apply the approach taken in the Arumugam et al (2019) paper Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data to this data? Despite the fact that my data is not from a metagenome. Thanks.
Yes, that definitely work. You can compare against NR using DIAMOND in long read mode, then analyze in MEGAN in long read mode. This will allow you to explore the taxonomic assignment of the contigs. They should all fall into a set of nodes that are on the path from the root to a specify species node (unless there is contamination…) The long read inspector will allow you to see which genes are found where.
Finally, you can then export all contigs in a frame-shift corrected fashion. This will allow you to apply tools that used translated alignment, such as Prokka.
Please let me know how this works for you!
Thanks so much for your prompt reply! Much appreciated.
Regarding the NCBI-nr database required for running DIAMOND, should I be downloading all of the 38 nr*.gz files from https://ftp.ncbi.nlm.nih.gov/blast/db/
do I need the single nr.gz from https://ftp.ncbi.nih.gov/blast/db/FASTA/
Download and use the single nr.gz.
Thank you. After downloading the file but before using DIAMOND, should I format the nr database using blast+ (the makeblastdb command), or will DIAMOND do the formatting?
DIAMOND does its own formatting