How to work with Uniprot-mapped data

I have several full metagenomes that have been run through BLAST to generate protein databases. They are all mapped to Uniprot Acc numbers.
Can I use this as input? -if so, how?

MEGAN needs to map uniprot Ids to taxon ids (and other ids, if you want to use e.g. the InterPro2GO viewer as well). You will need create a “synonyms file” for each classification that you want to use that says how to map words in your alignment reference header lines to taxon ids and other ids.
Producing those could be quite painful…

How about rerunning them using DIAMOND to compare against the NR database?

Hello Daniel,
I’m reviving this old conversation, because I’m having a problem dealing with my uniprot-mapped data. Running diamond with the non redundant database is not a possibility for me as it takes too much memory. I ran diamond blastx with the format 100:
$ diamond blastx -d uniprot.dmnd -q all_read_file.fastq.gz --outfmt 100 --sensitive -o all_read_file.diamond.outfmt100.daa

Since I didn’t really know what was important with the sql database (megan-map-Feb2022.db), I deleted the “mappings” table and created a new one with the same columns, but with the accession header of the uniprot sequences (as it appears in diamond results):
$ diamond view --daa all_read_file.diamond.outfmt100.daa --outfmt 6 --out all_read_file.diamond.outfmt100.tsv
$ head all_read_file.diamond.outfmt100.tsv

S0R0/1 tr|K1U1D0|K1U1D0_9ZZZZ 93.9 49 3 0 149 3 182 230 4.39e-25 102
S0R0/1 tr|A0A329U0R7|A0A329U0R7_9FIRM 93.9 49 3 0 149 3 61 109 8.58e-24 102
(…)

I filled the “Taxonomy” column with the “NCBI-taxon” (13th column) value from idmapping_selected.tab found here end of april. The mapping table was now something like this:

Accession Taxonomy GTDB EGGNOG INTERPRO2GO SEED EC
sp|Q6GZX4|001R_FRG3G 654924
sp|Q6GZX3|002L_FRG3G 654924
(…)

(NB: I changed it using sqlite3 python package, because I’m not much familiar with SQL database. Also I only filled Taxonomy because that was the only thing that was interesting me, and that this uniprot mapping file didn’t have those other information)
This wasn’t a success since the results of the command:
$ daa-meganizer -mdb 2023.03.01_1000_megan-uniprot-map.db -supp 0 -i all_read_file.diamond.outfmt100.daa
were:

Version MEGAN Community Edition (version 6.24.20, built 5 Feb 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 18.0.2.1
Functional classifications to use: EC, EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Loading ec.map: 8,200
Loading ec.tre: 8,204
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 240,103
Loading gtdb.tre: 240,107
Loading interpro2go.map: 14,242
Loading interpro2go.tre: 28,907
Loading seed.map: 961
Loading seed.tre: 962
Meganizing: all_read_file.diamond.outfmt100.daa
Meganizing init
Annotating DAA file using FAST mode (accession database and first accession per line)
Annotating references
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (52,155.1s)
Writing
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (6.4s)
Binning reads Initializing…
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using ‘Naive LCA’ algorithm for binning: GTDB
Using Best-Hit algorithm for binning: EC
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads…
Binning reads Analyzing alignments
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (3,758.7s)
Total reads: 60,609,239
With hits: 60,609,239
Alignments: 1,273,643,541
Assig. Taxonomy: 0
Assig. SEED: 0
Assig. EGGNOG: 0
Assig. GTDB: 0
Assig. EC: 0
Assig. INTERPRO2GO: 0
Binning reads Writing classification tables
10% 20% 30% 40% 50% 60% 100% (44.0s)
Binning reads Syncing
100% (0.0s)
Class. Taxonomy: 1
Class. SEED: 1
Class. EGGNOG: 1
Class. GTDB: 1
Class. EC: 1
Class. INTERPRO2GO: 1
Total time: 55,973.9s
Peak memory: 28.6 of 32G

How can I make that database?
Thank you in advance

Looks like you are doing everything correctly.
However, MEGAN is misinterpreting the “|” characters in your accessions, assuming that they separate different accessions. In other words, MEGAN assumes that accessions consist only of letters, digits and underscores…

I will think about how to address this… Allowing | to appear in accessions might break stuff, so perhaps I will add an option allowing this character to appear in accessions…