Feature request: longer sample names

IamIamI · April 14, 2021, 8:26am

Good day,

I was wondering if it’s possible to extend the allowed number of characters for output file names, using automatic sample processing.
I’m referring to the case where someone would use of a list of input files using option -i and where we specify only an output folder in -o.

For example, when running the following

/mnt/archgen/users/lesley_sitter/Software/Malt/malt-run
-J-Xmx40G
-d malt_index
-id 95
-i ZTB001.A0101.SG1.1.Human_Shotgun
ZTB001.A0101.SG1.2.Human_Shotgun
ZTB002.A0101.SG1.1.Human_Shotgun
ZSTB002.A0101.SG1.2.Human_Shotgun
-oa aligned_reads_folder/
-o rma6_output_folder/ \

The sample names used for the aligned read files are truncated but still good (it still chopped off the Human_Shotgun part, but at least in my case this doesn’t affect me)
aligned_reads_folder/ZTB001.A0101.SG1.1-aligned.fna.gz
aligned_reads_folder/ZTB001.A0101.SG1.2-aligned.fna.gz
aligned_reads_folder/ZTB002.A0101.SG1.1-aligned.fna.gz
aligned_reads_folder/ZTB002.A0101.SG1.2-aligned.fna.gz

But for the actual rma6 files, it cuts off 2 characters more, resulting in data being overwritten when the 2nd sample is being processed.
rma6_output_folder/ZTB001.A0101.SG1.rma6
rma6_output_folder/ZTB002.A0101.SG1.rma6

I know now after some testing this can be overcome by also specifying each individual sample path in -oa and -o, but when running thousands of samples, this gets very error prone so it would be nice to just have the full input string being used as prefix for the output string instead of it being chopped up.

But off course, this is just a request

Cheers,
Lesley

Daniel · April 28, 2021, 8:09am

Hi Lesley,

The problem is that MALT expects your input files to have a proper file suffix, such as .fasta or .fq or something. You don’t provide one and so the program identifies .Human_Shotgun and removes this “suffix”.

This also appears to affect the naming of the rma6 files, as well. Here it looks like first .Human_Shotgun was identified as a suffix and then removed, and then later .1 and/or .2 were also removed.

So, please try giving your input file names a suffix and I will look into modifying the code so that it can deal with input files that don’t have a suffix in their name.