Hi all,
I have an issue with malt-build regarding the resulting complexity of the database. As an input for the database I include CDs from different genomes, so, for example all my plant genomes smash in the end into wheat genome. I tested it with randomly sampled Fasta from another plant species (from db input) and there were no matches found. Do I control it somehow with --maxHitsPerSeed or any other parameter? Thanks!
Here is the log file of the run, I assume it looks fine:
MaltBuild - Builds an index for MALT (MEGAN alignment tool)
Options:
Input:
--input: .........
--sequenceType: DNA
Output:
--index: ...
Performance:
--threads: 10
--step: 1
Seed:
--shapes: default
--maxHitsPerSeed: 1000
Classification support:
--mapDB: .../megan-nucl-Feb2022.db
Deprecated classification support:
--parseTaxonNames: true
--noFun: false
Other:
--firstWordIsAccession: true
--accessionTags: gb| ref|
--firstWordOnly: false
--random: 666
--hashScaleFactor: 0.9
--buildTableInMemory: true
--extraStrict: false
--verbose: true
Version MALT (version 0.5.3, built 4 Aug 2021)
Author(s) Daniel H. Huson
Copyright (C) 2021 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Classifications to use: Taxonomy
Reference sequence type set to: DNA
Seed shape(s): 111110111011110110111111
Deleting index files: 0
Number input files: 517
Loading FastA files:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1466.6s)
Number of sequences: 186,748,193
Number of letters:101,888,004,433
BUILDING table (0)...
Seeds found: 101,887,763,290
tableSize= 2,147,483,639
hashMask.length=31
maxHitsPerHash set to: 1000
Initializing arrays...
100% (0.0s)
Analysing seeds...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (3808.2s)
Number of low-complexity seeds skipped: 755,530,672
Allocating hash table...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (464.8s)
Total keys used: 2,145,121,347
Total seeds matched:91,421,691,322
Total seeds dropped: 2,364,654,292
Opening file: table0.db
Allocating: 689.1 GB
Filling hash table...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (11780.6s)
Randomizing rows...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (605.8s)
Writing file: table0.idx
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (97.1s)
Writing file: table0.db
Size: 689.1 GB
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (6264.4s)
Writing file: index0.idx
100% (0.1s)
Loading ncbi.map: 2,302,807
Loading ncbi.tre: 2,302,811
Building mappings...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (474948.8s)
Writing file: taxonomy.idx
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (2.8s)
Writing file: ref.db
Writing file: ref.idx
100% (285.9s)
Total time: 500,362s
Peak memory: 900.5 of 1900G