Malt-run assigning 0 taxonomies on custom malt-build database

Hello Megan/Malt devs and community!

I’m trying to build a custom malt database to use for assigning taxonomies in a metagenomic fasta with malt version 0.6.1 in a remote linux server. My problem is that malt-run performs the alignment but can’t seem to make any taxonomic assignments.

My reference fasta is pretty small, it includes 12,420 mtDNA sequences (16,043,541 bp) with NCBI accession numbers as the headers. I ran malt-build multiple ways, using the -a2t flag with both the megan-nucl-Feb2022.db and NCBI’s nucl_gb.accession2taxid. Malt-build runs without errors but I get the same issue with malt-run whether I use the megan or ncbi file.

malt-build code with the megan db file:

malt-build --input PacificNorthWest_PalaeoFaunalReferences.fasta.gz -a2t megan-nucl-Feb2022.db --index malt_index_megandb --sequenceType DNA --verbose

I want to run a weighted LCA on my input fasta.gz file, but I have also tried the naive LCA option, with similarly un-assigned results. Here is the weighted LCA code:

malt-run --inFile SC-10c-TCMP_S207_L001.unmapped.fasta.gz --index malt_index_megandb --mode BlastN --output test_malt_megandb --alignments test_malt_megandb --maxExpected 0.00001 --weightedLCA TRUE --minSupport 3 --minPercentIdentityLCA 95 --lcaCoveragePercent 80 --numThreads 12 --verbose

I’ve run this sample with Kraken and with Bowtie2 to the refseq mitochondrion database and there is a high diversity of taxa so it’s not an issue with the sample.

The job runs without errors. The initial alignment steps seem to be working well, with 106,255 aligned queries and 2,230,669 alignments, so I suspect there isn’t an issue with my input fasta file that I’m trying to assign taxonomies to. I ran the LCA with a variety of more permissive arguments so I don’t think it’s that I’ve been too strict. However, no matter how I set it the LCA computation step assigns 0 total matches, 0 total references, and 0 total weights. When I malt-run with the naive LCA I get “Assig. Taxonomy” of 0. The rma files also show many alignments and 0 assignments. I’ve attached a text file with the run output.

I’m not seeing a similar question in the megan community forum and it’s coming up with different LCA options, so I’m guessing this must be an issue with the way I’m building the database. Can you provide any guidance on this issue?

Thanks very much for any help!

Libby

malt-run_output.txt (3.8 KB)

Hi Libby,

I’ve found recently that the .db file doesn’t work properly when building databases… I had similarly results as you.

I think the .db support in MALT has been broken for a few releases.

Replacing with the ‘deprecated’ (but still working) -a2t flag still works when building the database (on the advice of a colleague).

When I did it recently, these are the commands I used from my notes (raw copying, some of the flags might not be relevant for you)

		## OR DOWNLOAD a2t
		wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz

    	## Malt BUILD command
    	cd k2_db_maixnerhitgenomes/
malt-build -i *.gz -s DNA -d malt_db -t 14 -t 8 -a2t ~/cache/databases/acc2taxid/nucl_gb.accession2taxid.gz -v 2>&1 | tee malt-build.log

Thanks so much for your help James!

I fetched a fresh gzipped ncbi taxonomy and tried again with the same results. You and I are using very similar arguments, if this workaround works for you I think I can narrow it down to either an input file error or a version issue on my end. What version of malt are you running? Maybe I can start there.

Just got it to run, it was a super silly error on my part I was using -a2t instead of --mapDB with the megan .db file (thanks for pointing out that -a2t is deprecated James, that’s what got me there :] ). So, interestingly James I’m having the opposite behaviour that you are. I’m not sure why that would be but hopefully this helps someone else with their own troubleshooting some day with version 0.6.1

malt-build --input PacificNorthWest_PalaeoFaunalReferences.fasta.gz --mapDB megan-nucl-Feb2022.db --index malt_index --sequenceType DNA --verbose 2>&1 | tee malt_index/malt-build.log

Hi @jfy133, I’m looking through my megan results and noticing a lot of my taxa are being misclassified, I think because the accession is not in the megandb file so they don’t have a taxon id (eg Columbian mammoth hits score much higher but it ends up classified as Asian elephant).

These accessions are in the ncbi accession2taxid file, so I would prefer to get the malt db built the way you suggested, but I’ve tried out several different versions of malt and can’t seem to get any of them to build the index correctly with the accession2taxid file and -a2t (or --acc2taxa, or any alternative name the flag went by in older versions of malt). Would you be willing to share an example fasta header? Maybe my problem is with the input fasta, whose headers are accession numbers only, like so:

>CM051845
GTTAATGTAGCTTAATACAAAGCAAAGCACTGAAAATGCTTAGACGAGTCATTC.....

and the malt-build code

malt-build --input PacificNorthWest_PalaeoFaunalReferences.fasta --sequenceType DNA --index malt_ncbi_db -t 14 -a2t nucl_gb.accession2taxid -v 2>&1 | tee malt_ncbi_db/malt-build.log

Thanks for any additional help you can provide!

Hrm, then I’m even more confused :sweat_smile:

In the the database I referenced above, an example header of one of them is:

>NC_008527.1 Lactococcus cremoris subsp. cremoris SK11, complete sequence

But strange that it works… although I think I was working with 0.6.2…

Okay, I did some fiddling and I got it to run! It seems there needs to be a species name in the fasta headers, the accession id isn’t enough (accession ID vs version doesn’t appear to matter). Should be easy enough to get those.

Thanks so much for your help @jfy133, you’re the forum help hero :]