Malt-build does not seem assign a tax ID to some references


I have been using MALT to align some aDNA samples against a custom database that I have generated with malt-build. However, when I look at the SAM output I see that some of the reads are mapped to references that are not associated to any tax ID:

readX      16      HG530135.1|tax|1231072| ... <- tax ID present
readY      0       AP021861.1              ... <- tax ID missing

Since this happens for several reference sequences, I did some digging and I found out that the tax ID info is also missing in the MALT database file:

grep -ao ">AP021861.1.*" maltDB.dat/ref.db
>AP021861.1 Lacipirellula parvula PX69 DNA, complete

However, the same accession was present in the accession2taxid file I used to build the DB:

grep AP021861.1 nucl_gb.accession2taxid
AP021861	AP021861.1	2650471	1761017769

The database was generated with the following command:
malt-build -i library.fna -a2taxonomy ./nucl_gb.accession2taxid -s DNA -t 80 --step 1 -d maltDB.dat

MALT version is: 0.4.1, built 24 May 2018

Is this due to a mistake on my side or is there anything I am missing? I worry that the reads will end up unassigned by the LCA procedure and won’t show up when I load the corresponding rma6 file on MEGAN.

Thanks for your help,

I was experiencing this same issue and worked around by creating a synonyms file of the form `synonym “tab” taxid". I’m not sure if this is because of MEGAN or if it is user error.

I just took a look at this. Unfortunately, recent versions of MaltBuild completely ignore the taxonomy mapping file… I have fixed this in a new release V0_5_3.

Hi @Daniel,
I see, now it makes sense that the outputs would look like that. Thanks for looking into this and fixing the issue.