Hello there. I am using MEGAN6 Linux on a tabular file obtained with MagicBlast on a database of plant genomes. The identifier is hence Accession Number of the NCBI Genomes (NC_* and NZ_*) and the Taxonomy assignment works fine.
The problem is that I think the software misundestand my tabuar format, taking the wrong column as the column with the score.
The tabular output format shows one alignment per line with these tab delimited fields:
- Query/read sequence identifier
- Reference sequence identifier
- Percent identity of the alignment
- Not used
- Not used
- Not used
- Alignment start position on the query sequence
- Alignment stop position on the query sequence
- Alignment start position on the reference sequence
- Alignment stop position on the reference sequence
- Not used
- Not used
- Alignment score
- Query strand
- Reference sequence strand
- Query/read length
- Alignment as extended BTOP string This is the same BTOP string as in BLAST tabular output with a few extensions:
- a number represents this many matches,
- two bases represent a mismatch and show query and reference base,
- base and gap or gap and base, show a gap in query or reference,
- ^^ represents an intron of this number of bases,
- represents an insertion (gap in reference) of this number of bases,
- %% represents a deletion (gap in read) of this number of bases,
- () shows number of query bases that are shared between two parts of a spliced alignment; used when proper splice sites were not found
- Number of different alignments reported for this query sequence
- Not used
- Compartment - a unique identifier for all alignments that belong to a single fragment. These can be two alignments for a pair of reads or alignments to exons that were not spliced.
- Reverse complemented unaligned query sequence from the beginning of the query, or ‘-‘ if the query aligns to the left edge
- Unaligned sequence at the end of the query, or ‘-‘
- Reference sequence identifier where the mate is aligned, if different from the identifier in column 2, otherwise ‘-‘
- Alignment start position on the reference sequence for the mate, or ‘-‘ if no alignment for the mate was found; a negative number denotes a divergent pair
- Composite alignment score for all exons that belong to the fragment
Instead of taking the value in the 13th column as the score, it takes the 12th column.
Is this because it expects a specific tabular format? What format should I refer to? I’ve noticed that the blasttab format is customizable, so it doesn’t have a standard organization.
Thanks in advance & best regards,