Rma2info Empty output

pkaps · July 13, 2021, 1:12pm

Hello,

I ran malt with a custom database that was created with the latest megan-nucl-Jan2021.db.zip file. Almost all of the reads aligned, yet when I ran rma2info on the resulting rma file, the output was empty. However, If I create a smaller database using a subset of reference sequences and run the same sample, taxonomies are output. For reference, the full database contains all bacterial species and Hg19. What could be causing this difference? Any guidance here is appreciated.

gi|89030144|ref|NT_113911.1
gi|224514656|ref|NT_167229.1
gi|224514661|ref|NT_167233.1
gi|224514635|ref|NT_113950.2
gi|224514652|ref|NT_167225.1
gi|224514650|ref|NT_167223.1
gi|251831106|ref|NC_012920.1

Bacterial reads are similar, but have names:

gi|379725073|ref|NC_016937.1| Francisella tularensis subsp. tularensis TI0902
gi|379716390|ref|NC_016933.1| Francisella tularensis subsp. tularensis TIGB03
gi|384162394|ref|NC_017189.1| Bacillus amyloliquefaciens LL3
gi|384162404|ref|NC_017190.1| Bacillus amyloliquefaciens LL3

Malt output:
Starting file: a/test1.rma 10% 20% 30% 40% 50% 60% 100% (6.2s) Finishing file: a/test1.rma Binning reads: Initializing... Initializing binning... Using Best-Hit algorithm for binning: Taxonomy Binning reads... Binning reads: Analyzing alignments Total reads: 4,668 With hits: 4,668 Alignments: 37,194 Assig. Taxonomy: 0 MinSupport set to: 1 Binning reads: Writing classification tables Numb. Tax. classes: 1 Binning reads: Syncing Class. Taxonomy: 1 Analysis written to file: a/test1.rma Num. of queries: 5000 Aligned queries: 4668 Num. alignments: 37194 Total time: 585s Peak memory: 178.6 of 380G

Daniel · July 16, 2021, 3:12pm

The problem is that the reference sequences are labelled with GI numbers, which I believe were discontinued a number of years ago. The Megan mapping database does not contain GI numbers

While the headers also contain other accessions, the mapping code currently requires that the accession comes first
I will look into this

pkaps · July 16, 2021, 4:52pm

Thank you for your help. I have also tried building a database with a synonyms file and aligning using that database. My synonym file is of the format accession <tab> taxid. There is a synonym for every entry in my reference, but some classified reads are not assigned a taxonomy. Please let me know if there is a way to fix this.

Using Best-Hit algorithm for binning: Taxonomy
Binning reads...
Binning reads: Analyzing alignments
Total reads:            4,668
With hits:               4,668 
Alignments:             37,194
Assig. Taxonomy:         4,552

I have also saved the alignments for this run. The references that have a species name in the header had an addition |tax|taxid field added in the alignment output, but those that did not have a species name, or had a species name that was outdated, did not have this additional field. I believe that reads that were assinged to these species were not given a taxonomy. Example:

CP068211.1_seq3-1       gi|568125122|ref|NC_023061.1|   97.8    45      1       0       4       48      3205098 3205143 3e-12   78
CP068211.1_seq4-1       gi|528981796|ref|NC_021285.1|tax|1167634        100.0   75      0       0       1       75      6365604 6365679 1e-29   137

Finally, I took the blast output of malt-run and ran it through blast2rma and rma2info. All reads were now assigned a taxonomy, but the classifications were different from those produced by malt, which I assume is expected. Now I’m unsure if I can simply use the blast results from malt with my synonyms file to get the taxonomies.

With hits:               4,668 
Alignments:             37,194
Assig. Taxonomy:         4,668

Daniel · August 4, 2021, 6:47am

That should be ok, I believe