NR_90 or full_NR_database? what are best option

Hi everyone, I’d like to ask for your advice on which database you would recommend for running MEGAN7. For instance, if the execution time is shorter and the results are relatively similar, would using NR_90 be a better option? Or, on the other hand, is it preferable to use the full_NR since the taxonomic and functional assignments might be more accurate? I have this doubt because I’m about to start testing MEGAN7 with shotgun metagenomics data.

Thanks a lot for your time and help!. I really love MEGAN and the valuable work you’ve contributed to this community

Dear @jalcantara,

It is indeed a trade-off to consider. In our tests, NCBI-nr90 provided results very similar to the full NCBI-nr, with only a very small percentage of taxonomic mismatches. For functional analysis, all the reduced databases performed very well.

We will soon be releasing a preprint with the details of this analysis, which may help you decide further. In general, NCBI-nr90 is a quite suitable alternative to the full nr. However, if you are working with only a very small number of samples, it may still be preferable to use the full NCBI-nr.

Best regards,
Anupam

1 Like

Thank you for your response, @Anupam.

I have another question about using MEGAN7 in relation to this paper: https://www.biorxiv.org/content/10.1101/2024.02.17.580828v1

Is it possible to implement mgPGPT using MEGAN7? Could you also upload the necessary files (such as the mapping file) on the MEGAN main page, and provide instructions on how to download the database? Additionally, is there any pipeline available specifically for working with phytopathogens?

Thank you very much for your support and help.

Dear @jalcantara,

This functionality is already available in MEGAN7. You can download the required files from here:
https://plabase.cs.uni-tuebingen.de/pb/download.php

A general tutorial is available here:
https://github.com/husonlab/tutorials/wiki/Tutorial-mgPGPT

Please make sure to select the correct files for your needs (e.g., mapping files, etc.).

For the alignment step, you should use DIAMOND in either blastx or blastp mode, depending on whether your input data are raw reads or protein sequences.

Will these be useful? It includes metadata for pathogens, and once you align against the default database, you can match the results here to identify which is which.

https://plabase.cs.uni-tuebingen.de/pb/plaba_db.php

https://plabase.cs.uni-tuebingen.de/pb/pgpt_ont.php

Best regards,
Anupam

1 Like

Thank you for your assistance and guidance. I will try to follow your recommendations

1 Like