Prot.accession2txid导入问题

mullermeta · November 23, 2024, 2:50pm

megan
I use Megan6 software to upload what format the acc2tx.map file should be, because I can’t annotate the species classification by uploading the decompression format directly

Anupam · November 24, 2024, 10:16am

Hi @mullermeta,

Could you please share the initial steps you followed? Are you using DIAMOND for alignment or another tool? Additionally, could you clarify the output format generated by DIAMOND or the other tool? With this information, I’ll be able to assist you more effectively.

Best regards,
Anupam

mullermeta · November 24, 2024, 11:03am

I used diamond 2.1.10 after matching the protein sequence of the non-redundant directory

nohup diamond blastx \

-d nr.dmnd \

-q cds.fa \

-o cds_wbbw_annotation.daa \

-f 100 --threads 36 --evalue 0.00001-b24-c1 \

–tmpdir /media/share/iyun1907_temp > diamond_blastx.log 2>&1 Then upload the generated daa file to Megan7 GUI as files-meganize-daa-files. Select “load accession mapping file” under the second button “taxonomy”, upload the prot.accession2txid file extracted directly and annotate the species. The result is 0

Anupam · November 24, 2024, 11:37am

Hi @mullermeta,

You don’t need to upload the prot.accession2txid file; instead, you should use the megan-mapping-file.db for MEGANization. Please download the appropriate mapping file from the MEGAN7 download page, based on whether you are using the Ultimate or Community version of MEGAN7. For more details, please refer to the tutorial linked below.

https://software-ab.cs.uni-tuebingen.de/download/megan7/welcome.html

Please feel free to let me know if you have further questions.

Best regards,
Anupam

mullermeta · November 24, 2024, 12:26pm

Hi @Anupam
Thank for your answer ! I want to use prot.accession2taxid because when using the megan-nr-r1-mdb file, there are more than seven levels of kingdom, phylum, class, order, family, genus, and species in the annotation result. For example, the “FCB group” and “Bacteroidota/Chlorobiota group” of “(NCBI; cellular organisms; Bacteria; FCB group; Bacteroidota/Chlorobiota group; Bacteroidota; Bacteroidia; Bacteroidales; Bacteroidaceae; Bacteroides; unclassified Bacteroides; Bacteroides sp.CG01)” So I would like to know how to set the parameters of daa2info to produce this annotation result. For example, k__Bacteria p__(Bacteria) c__(Bacteria) o__(Bacteria) f__(Bacteria) g__(Bacteria), each level carries a classification level

Anupam · November 24, 2024, 1:06pm

Hi @mullermeta,

This is not an issue. You can set the flag -mro in daa2info, which stands for “major ranks only.” This will ensure that you end up with the required 7 major taxonomic levels.

Will this approach help? The MEGAN mapping file was also generated from the NCBI prot.accession2taxid. If a protein in this file is assigned to an intermediate rank, you will observe the intermediate rank in your results, especially if all top percentage alignments are assigned to this rank, and the LCA (Lowest Common Ancestor) algorithm retains it at this level.

or are you using some different prot.accession2taxid file?

Best regards,
Anupam

mullermeta · November 24, 2024, 2:07pm

Thank you very much for your answer, this is very helpful for my difficulties, and besides I would like to ask you, should I use diamond blastx range-culling mode for non-redundant directories built by contigs via cd-hit? Here are the results of my cds.fa evaluation
Statistics without reference cds

contigs 1012641

contigs (>= 0 bp) 1531202

contigs (>= 1000 bp) 453130

contigs (>= 5000 bp) 2274

contigs (>= 10000 bp) 255

contigs (>= 25000 bp) 7

contigs (>= 50000 bp) 0

Largest contig 38076
Total length 1121205066
Total length (>= 0 bp) 1284472008
Total length (>= 1000 bp) 709221360
Total length (>= 5000 bp) 16478622
Total length (>= 10000 bp) 3459270
Total length (>= 25000 bp) 225588
Total length (>= 50000 bp) 0
N50 1194
N90 660
auN 1481.9
L50 317044
L90 818507

Anupam · November 24, 2024, 6:57pm

please have a look at these threads, and than you can decide

github.com/bbuchfink/diamond

Optimal configuration for metagenome assembly classification

opened 03:39PM - 21 Sep 18 UTC

closed 01:21PM - 22 Sep 18 UTC

bede

I'm using Diamond for classifying metagenome contigs (HiSeq/MetaSpades) against …NR using a high memory server, and it works great! This workflow is so much easier than it used to be thanks to recent updates. However I suspect the sensitivity of some of the classifications for viral contigs are compromised by hits against multiple protein subjects. This is causing many long contigs to only be labelled at family level. I have reviewed the documentation extensively but remain unclear on whether I should use `--range-culling`, or whether this is even possible given that frameshift alignment mode seems inappropriate here, with me using assembled contigs (from accurate short reads) as queries. Perhaps you could recommend settings for sensitive taxonomic classification of viral and bacterial contigs? Current usage: ``` ~/diamond/0.9.22/diamond blastx \ --query <> \ --db <> \ --out <> \ --more-sensitive \ --outfmt 102 \ --block-size 20 \ --index-chunks 1 \ --tmpdir /dev/shm \ --threads 16 ```

mullermeta · November 25, 2024, 1:41am

Ok, I will carefully read these documents you shared before making a decision
Best regards
Jintian