Using an alternate tree

I’d like to provide a custom set of sequences and a tree. you answered a similar question before, How to use my own taxonomic or functional classification in MEGAN?
but it was not clear whether this can be done only in ultimate edition. I am competent modify java code if necessary. For this purpose, it is perfectly fine to replace the standard ncbi tree - i don’t need more than one available. The newick format for the tree is likewise not a problem.

Michael Gribskov

Dear Michael
The Edit->Preferences menu has items that allow you to specify an alternative tree.
I am not sure how well tested these items are, so please let me know about any problems that you may encounter.

I think I am making progress.
The ncbi.tre file (newick format) shows the tree using the ncbi taxonomy id
The file maps the taxonomic name to the taxonomy id
it looks like the .lvl files specify levels for filtering. is this entirely free or are there limits? I notice you use
the range 0-100 and cluster the values at the top and bottom of the range. i guess this file may be optional

but how is the link between the sequence ID and the taxonomy name (found in square brackets) made? My diamond search files only contain the protein ID? do you actually query the ncbi database to get the taxonomic name?

here is the scenario: I have a database in fasta format, can i just put the taxonomic name in the description in square brackets, make the indices, and search (diamond or blast). then load the search file with the appropriate .tre and .map? I think I’m missing a link.

At present, the .lvl file is ignored and the numbers and ranks that you find there are hard-coded.

If the taxon names are in the database header lines, then you tell DIAMOND to store the complete header lines (by default it only keeps the first word in each header line). Then, if you select “parse names” in the appropriate dialog or using the appropriate command line option if you are using a MEGAN command line tool, then MEGAN will parse the the reference header lines and will attempt to match names that it finds to ones that appear in your file