I have several full metagenomes that have been run through BLAST to generate protein databases. They are all mapped to Uniprot Acc numbers.
Can I use this as input? -if so, how?
MEGAN needs to map uniprot Ids to taxon ids (and other ids, if you want to use e.g. the InterPro2GO viewer as well). You will need create a âsynonyms fileâ for each classification that you want to use that says how to map words in your alignment reference header lines to taxon ids and other ids.
Producing those could be quite painfulâŚ
How about rerunning them using DIAMOND to compare against the NR database?
Hello Daniel,
Iâm reviving this old conversation, because Iâm having a problem dealing with my uniprot-mapped data. Running diamond with the non redundant database is not a possibility for me as it takes too much memory. I ran diamond blastx with the format 100:
$ diamond blastx -d uniprot.dmnd -q all_read_file.fastq.gz --outfmt 100 --sensitive -o all_read_file.diamond.outfmt100.daa
Since I didnât really know what was important with the sql database (megan-map-Feb2022.db), I deleted the âmappingsâ table and created a new one with the same columns, but with the accession header of the uniprot sequences (as it appears in diamond results):
$ diamond view --daa all_read_file.diamond.outfmt100.daa --outfmt 6 --out all_read_file.diamond.outfmt100.tsv
$ head all_read_file.diamond.outfmt100.tsv
S0R0/1 tr|K1U1D0|K1U1D0_9ZZZZ 93.9 49 3 0 149 3 182 230 4.39e-25 102
S0R0/1 tr|A0A329U0R7|A0A329U0R7_9FIRM 93.9 49 3 0 149 3 61 109 8.58e-24 102
(âŚ)
I filled the âTaxonomyâ column with the âNCBI-taxonâ (13th column) value from idmapping_selected.tab found here end of april. The mapping table was now something like this:
Accession Taxonomy GTDB EGGNOG INTERPRO2GO SEED EC
sp|Q6GZX4|001R_FRG3G 654924
sp|Q6GZX3|002L_FRG3G 654924
(âŚ)
(NB: I changed it using sqlite3 python package, because Iâm not much familiar with SQL database. Also I only filled Taxonomy because that was the only thing that was interesting me, and that this uniprot mapping file didnât have those other information)
This wasnât a success since the results of the command:
$ daa-meganizer -mdb 2023.03.01_1000_megan-uniprot-map.db -supp 0 -i all_read_file.diamond.outfmt100.daa
were:
Version MEGAN Community Edition (version 6.24.20, built 5 Feb 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 18.0.2.1
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: all_read_file.diamond.outfmt100.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (52,155.1s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (6.4s)
Binning reads InitializingâŚ
Initializing binningâŚ
Using âNaive LCAâ algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using âNaive LCAâ algorithm for binning: GTDB
Using Best-Hit algorithm for binning: EC
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning readsâŚ
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (3,758.7s)
Total reads: 60,609,239
With hits: 60,609,239
Alignments: 1,273,643,541
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. GTDB: 0
Assig. EC: 0
Assig. INTERPRO2GO: 0
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 100% (44.0s)
Binning reads Syncing
100% (0.0s)
Class. Taxonomy: 1
Class. SEED: 1
Class. EGGNOG: 1
Class. GTDB: 1
Class. EC: 1
Class. INTERPRO2GO: 1
Total time: 55,973.9s
Peak memory: 28.6 of 32G
How can I make that database?
Thank you in advance
Looks like you are doing everything correctly.
However, MEGAN is misinterpreting the â|â characters in your accessions, assuming that they separate different accessions. In other words, MEGAN assumes that accessions consist only of letters, digits and underscoresâŚ
I will think about how to address this⌠Allowing | to appear in accessions might break stuff, so perhaps I will add an option allowing this character to appear in accessionsâŚ
Hello Daniel,
I have the same problem as Agnes, our server does not support the NR database. I ran Diamond Blastx with Uniprot 100. I am viewing the results in MEGAN GUI with the map-id that I downloaded from https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz. However, I get an error. Could you tell me how I could upload the mapping file for this database for use in GUI? It is not clear to me.
I appreciate your response.