I use Megan6 software to upload what format the acc2tx.map file should be, because I can’t annotate the species classification by uploading the decompression format directly
Hi @mullermeta,
Could you please share the initial steps you followed? Are you using DIAMOND for alignment or another tool? Additionally, could you clarify the output format generated by DIAMOND or the other tool? With this information, I’ll be able to assist you more effectively.
Best regards,
Anupam
I used diamond 2.1.10 after matching the protein sequence of the non-redundant directory
nohup diamond blastx \
-d nr.dmnd \
-q cds.fa \
-o cds_wbbw_annotation.daa \
-f 100 --threads 36 --evalue 0.00001-b24-c1 \
–tmpdir /media/share/iyun1907_temp > diamond_blastx.log 2>&1 Then upload the generated daa file to Megan7 GUI as files-meganize-daa-files. Select “load accession mapping file” under the second button “taxonomy”, upload the prot.accession2txid file extracted directly and annotate the species. The result is 0
Hi @mullermeta,
You don’t need to upload the prot.accession2txid
file; instead, you should use the megan-mapping-file.db
for MEGANization. Please download the appropriate mapping file from the MEGAN7 download page, based on whether you are using the Ultimate or Community version of MEGAN7. For more details, please refer to the tutorial linked below.
https://software-ab.cs.uni-tuebingen.de/download/megan7/welcome.html
Please feel free to let me know if you have further questions.
Best regards,
Anupam
Hi @Anupam
Thank for your answer ! I want to use prot.accession2taxid because when using the megan-nr-r1-mdb file, there are more than seven levels of kingdom, phylum, class, order, family, genus, and species in the annotation result. For example, the “FCB group” and “Bacteroidota/Chlorobiota group” of “(NCBI; cellular organisms; Bacteria; FCB group; Bacteroidota/Chlorobiota group; Bacteroidota; Bacteroidia; Bacteroidales; Bacteroidaceae; Bacteroides; unclassified Bacteroides; Bacteroides sp.CG01)” So I would like to know how to set the parameters of daa2info to produce this annotation result. For example, k__Bacteria p__(Bacteria) c__(Bacteria) o__(Bacteria) f__(Bacteria) g__(Bacteria), each level carries a classification level
Hi @mullermeta,
This is not an issue. You can set the flag -mro
in daa2info
, which stands for “major ranks only.” This will ensure that you end up with the required 7 major taxonomic levels.
Will this approach help? The MEGAN mapping file was also generated from the NCBI prot.accession2taxid
. If a protein in this file is assigned to an intermediate rank, you will observe the intermediate rank in your results, especially if all top percentage alignments are assigned to this rank, and the LCA (Lowest Common Ancestor) algorithm retains it at this level.
or are you using some different prot.accession2taxid
file?
Best regards,
Anupam
Thank you very much for your answer, this is very helpful for my difficulties, and besides I would like to ask you, should I use diamond blastx range-culling mode for non-redundant directories built by contigs via cd-hit? Here are the results of my cds.fa evaluation
Statistics without reference cds
contigs 1012641
contigs (>= 0 bp) 1531202
contigs (>= 1000 bp) 453130
contigs (>= 5000 bp) 2274
contigs (>= 10000 bp) 255
contigs (>= 25000 bp) 7
contigs (>= 50000 bp) 0
Largest contig 38076
Total length 1121205066
Total length (>= 0 bp) 1284472008
Total length (>= 1000 bp) 709221360
Total length (>= 5000 bp) 16478622
Total length (>= 10000 bp) 3459270
Total length (>= 25000 bp) 225588
Total length (>= 50000 bp) 0
N50 1194
N90 660
auN 1481.9
L50 317044
L90 818507
please have a look at these threads, and than you can decide
Ok, I will carefully read these documents you shared before making a decision
Best regards
Jintian