Hello,
I have been using MALT to align some aDNA samples against a custom database that I have generated with malt-build. However, when I look at the SAM output I see that some of the reads are mapped to references that are not associated to any tax ID:
readX 16 HG530135.1|tax|1231072| ... <- tax ID present
readY 0 AP021861.1 ... <- tax ID missing
Since this happens for several reference sequences, I did some digging and I found out that the tax ID info is also missing in the MALT database file:
grep -ao ">AP021861.1.*" maltDB.dat/ref.db
>AP021861.1 Lacipirellula parvula PX69 DNA, complete
However, the same accession was present in the accession2taxid file I used to build the DB:
grep AP021861.1 nucl_gb.accession2taxid
AP021861 AP021861.1 2650471 1761017769
The database was generated with the following command:
malt-build -i library.fna -a2taxonomy ./nucl_gb.accession2taxid -s DNA -t 80 --step 1 -d maltDB.dat
MALT version is: 0.4.1, built 24 May 2018
Is this due to a mistake on my side or is there anything I am missing? I worry that the reads will end up unassigned by the LCA procedure and won’t show up when I load the corresponding rma6 file on MEGAN.
Thanks for your help,
Claudio