MEGAN with magicblast tabular output

Giulia · June 8, 2023, 1:21pm

Hello there. I am using MEGAN6 Linux on a tabular file obtained with MagicBlast on a database of plant genomes. The identifier is hence Accession Number of the NCBI Genomes (NC_* and NZ_*) and the Taxonomy assignment works fine.

The problem is that I think the software misundestand my tabuar format, taking the wrong column as the column with the score.

The tabular output format shows one alignment per line with these tab delimited fields:

Query/read sequence identifier
Reference sequence identifier
Percent identity of the alignment
Not used
Not used
Not used
Alignment start position on the query sequence
Alignment stop position on the query sequence
Alignment start position on the reference sequence
Alignment stop position on the reference sequence
Not used
Not used
Alignment score
Query strand
Reference sequence strand
Query/read length
Alignment as extended BTOP string This is the same BTOP string as in BLAST tabular output with a few extensions:

a number represents this many matches,
two bases represent a mismatch and show query and reference base,
base and gap or gap and base, show a gap in query or reference,
^^ represents an intron of this number of bases,
represents an insertion (gap in reference) of this number of bases,
%% represents a deletion (gap in read) of this number of bases,
() shows number of query bases that are shared between two parts of a spliced alignment; used when proper splice sites were not found

Number of different alignments reported for this query sequence
Not used
Compartment - a unique identifier for all alignments that belong to a single fragment. These can be two alignments for a pair of reads or alignments to exons that were not spliced.
Reverse complemented unaligned query sequence from the beginning of the query, or ‘-‘ if the query aligns to the left edge
Unaligned sequence at the end of the query, or ‘-‘
Reference sequence identifier where the mate is aligned, if different from the identifier in column 2, otherwise ‘-‘
Alignment start position on the reference sequence for the mate, or ‘-‘ if no alignment for the mate was found; a negative number denotes a divergent pair
Composite alignment score for all exons that belong to the fragment

Instead of taking the value in the 13th column as the score, it takes the 12th column.

Is this because it expects a specific tabular format? What format should I refer to? I’ve noticed that the blasttab format is customizable, so it doesn’t have a standard organization.

Thanks in advance & best regards,