Question regarding custom ncbi.map / ncbi.tre

I think i have found it already.

So ETE3 can do this apparently. The way i went about this was using the conda repos
http://etetoolkit.org/download/

Then write a small python script based on the ETE3 documentation.

For this to work, you would only need to change the ROOT_TAXA variable. In this example it’s 2, which is the TaxID of the Bacteria kingdom.

ROOT_TAXA = 2
import os, sys, os.path, glob
from ete3 import NCBITaxa
from ete3 import Tree

# Get Taxa from NCBI and update the DB if needed
ncbi = NCBITaxa()
ncbi.update_taxonomy_database()

# Load decendants of "2" into memory. 
# This will extract everything that branches out from that point. So for another organism this is the 
# only number that needs to be changed for this to work. 
descendants = ncbi.get_descendant_taxa(ROOT_TAXA , collapse_subspecies=False)
names = ncbi.get_taxid_translator(descendants)

# Open a new file and/or empty it
ncbimap_out = open("ncbi.map", "w")
ncbimap_out.truncate(0)

# For each taxid, print the ID + the corresponding refseq name
[ print((str(taxid) + "\t" + names[taxid]), file=ncbimap_out) for taxid in descendants]

# Grab the decendants again, but this time in tree format
descendants = ncbi.get_descendant_taxa(ROOT_TAXA , collapse_subspecies=False, return_tree=True)

# Open new file, and/or empty it
ncbitre_out = open("ncbi.tre", "w")
ncbitre_out.truncate(0)

# Convert NCBI's formatted tree to newick, format=3 is the ETE format 
# http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#trees
t = Tree(descendants.write(features=['taxid']), format=3)

# Write it out
t.write(format=3, outfile="ncbi.tre")`

This should generate a current ncbi.tre and ncbi.map file.

2 Likes