Compare multiple blast-files using command line MEGAN and custom tree

Hi Community,
I am new. I will try to be precise and specific, but I think my question has many sub-questions as my command line skills are rather limited. I have used MEGAN6 via the interface before but I have too many samples this time (158 samples).

My goal: A comparison table across all samples, i.e. taxonomy x abundance (fully resolved, i.e. all possible taxon ranks), based on LCA algorithm with custom settings and custom mapping tree.

My starting point: 158 blastn files in output format 6 with 20 matches per sequence.

My approach: I wanted to first use the tool blast2rma followed by either one of the tools compute-comparison, rma2info or extract-biome, not sure which one of the latter are the most correct, seems they can all do the job. However, I can’t find a way to specify the mapping tree in blast2rma. Therefore I wanted to use MEGAN6 UE directly, but cannot find the command to import multiple files (i.e. all files in a DIR) although I saw it mentioned in an old post that this setting should exist (mentioned here).

Seeing that MEGAN seems to need a specific structure in a command script, i.e. one command per line followed by a “;” [semicolon] I am unsure whether I can import all my files in a for loop within the script… (this is related to my limited coding skills).

Can someone point my in the right direction as to how to perform these steps?

Kind regards,
Mathilde

It seems I cant edit my post anymore, just wanted to add the update that I found the option -mdb which I understand is for specifying the mapping tree for the blast2mra function, however in the Manual it indicates that this should be a .db-file, but I have only a .map and a .tre file.

Hi @MDahl,
As I know the specific structure for the commands is related to Ultimate Edition, you can import the files in a for loop and compare them with compute-comparison tool.

Best,
Farzan

Thank you for your comment.
I have tried to run this from the directory of my blastn files:
megan/MEGAN -g -E -c megantest.txt

where megantest.txt says:
load mapFile='/mnt/Data/databases/silvamod128v2_megan/silvamod128v2_megan.map';
load treeFile='/mnt/Data/databases/silvamod128v2_megan/silvamod128v2_megan.tre';
for i in *.blastn; do
import blastFile=$i meganFile=$i.rma;
done;
But nothing really happens no error message and no files are generated, Megan seems to open in a split second but disappears again. The terminal just gives me a new line. I am not very skilled in writing these type of scripts, so any feedback would be greatly appreciated!

@MDahl You work with which edition? Community Edition or Ultimate Edition?

Ultimate Edition :slight_smile:

Hi again,
I am still stuck on this. I am going back and forward on what is best to use, MEGAN UE directly or some of the stand-alone tools.
Right now I am back to wanting to use the blast2lca, however still facing the issue mentioned above that the format for the -mbd (Mapping database file). Can someone clarify if only MEGAN provided databased can be used for this application (blast2lca?)

Sorry for the very late reply, this semester has been super busy…

Here is a script that works for me:

load taxonomyFile=‘/Users/huson/tmp/silva/tax_slv_ssu_138.1.tre’;
load mapFile=‘/Users/huson/tmp/silva/tax_slv_ssu_138.1.acc_taxid’ mapType=Synonyms cName=Taxonomy parseTaxonNames=true;
import blastFile=‘/Users/huson/tmp/silva/file1.sam’ meganFile=‘/Users/huson/tmp/silva/file1.rma’ format=SAM mode=BlastN;
import blastFile=‘/Users/huson/tmp/silva/file2.sam’ meganFile=‘/Users/huson/tmp/silva/file2.rma’ format=SAM mode=BlastN;
quit;

The first line loads the .tre and .map files. Note that you only need to specify the .tre file if the .map file is in the same directory as the .tre file.

Usage:

load taxonomyFile=<filename> [mapFile=<filename>]; - Load taxonomy.tre and taxonomy.map files

The second line loads the accession mapping file (and turns on name parsing for taxa), usage:

load mapFile= mapType=<Accession|Synonyms|MeganMapDB> cName=<|EC|EGGNOG|GTDB|INTERPRO2GO|KEGG|SEED|Taxonomy> [parseTaxonNames={false|true}];
- Loads a mapping file

The third and fourth line process two different files. Note that the MEGAN script parser is very basic and does not support for-loops . So you need to list all import commands explicitly in the script file.
In this example, the files are in SAM format.

Dear Daniel,
Thank you for getting back to me.
In the end I solved my problem by reinstalling MEGAN5 CE, for some reason only this old version will accept our alternative taxonomy. I had to abandon command line and instead click my way through 158 samples, which was very sub-optimal, but I managed, so the job is done now. For future analysis I actually think CREST (which I think you are also part of) might be a better choice for this particular job (SSU taxa x count tables), whereas MEGAN is a great tool for our mRNA related analysis.
Kind regards,
Mathilde