Last or DIAMOND?

The Last aligner is reportedly many times faster than DIAMOND on small problems, and a number of times faster on very large datasets.

For example, when aligning 60 million DNA reads of length 101 against 110 million reference proteins, Last is just over 3 times as fast as DIAMOND (287 minutes vs 900 minutes wall clock, running on a Linux server with 32 cores and 512GB of memory).

However, there are a couple of issues that make Last currently less suitable for us together with MEGAN:

  • it can’t read compressed input files, thus such files have to be uncompressed, then run, then compressed.
    Given the number and size of fastq files that most projects involve, this is annoying.
  • The output is not sorted by query. For efficiency, when importing a file, MEGAN streams through the file once and assumes that all matches for a given query appear consecutively together. So, if you import data directly from Last, the same read my be counted multiple times. (Perhaps it is possible to change this by setting appropriate command line options?). A tool called maf-sort.sh for sorting MAF file entries exists, but it appears to sort by subject sequence, not by query sequence…
  • Last does not provide a compact format for describing the details of alignments. For the above mentioned dataset, in MAF format the Last output file is 420GB in size. In contrast, the DAA file produced by DIAMOND is only 17GB. While it would take MEGAN too long to analyze the output of Last, it takes 500 minutes to perform complete taxonomic and functional analysis and indexing of all reads and alignments in the file produced by DIAMOND.

In summary, Last is faster than DIAMOND (and reportedly more sensitive), but at present does not produce output that can easily be imported into MEGAN.

MEGAN (V6.6.8+) is able to import Last’s MAF format, but with the caveat that instances of the same read occurring in different parts of the input file are treated as different reads.

I have run quite a number of projects with LAST, especially before DIAMOND, but now also when I have non-coding nucleotide data.

  1. One can definitely run LAST on compressed input. It’s just a matter of designing a pipeline. I have had best performance (no extensive tests) using the parallel-fasta script bundled with LAST, and my pipeline looks like this:
gunzip -c $< | parallel-fasta --keep-order --no-notice -P 0 "lastal -e200 -Q1 ./refseq_rna.lastdb" | gzip -c > $@

I am having problems, however, in saturating cpus. This is quite recent and I haven’t had time to do a lot of testing.

  1. When I’ve imported this into MEGAN, I’ve reformated to the blast like output, which includes a rather horrible awk oneliner. The good thing about this approach is that the output can be sorted on the command line in a pipeline.

Here’s the awk oneliner including the sort (would be better with native support in MEGAN of course):

awk '/^[^\#]/ { print $$7 "\t" $$2 "\t100\t" $$4 "\t0\t0\t" $$8 "\t" $$11 "\t" $$3 "\t" $$6 "\t1e-100\t" $$1 }' | sort -k 1,1 -k 12,12rn

and the pipeline as a make recipe, using the awk oneliner as a macro:

gunzip -c $< | maf-convert tab | $(AWK_LASTTAB2MEGANTAB) | gzip -c > $@

Note that both the input file and the output file are read/written in gzipped format.

  1. LAST’s scores are much larger than BLAST’s, so one has to set parameters differently.

In summary, it would be great with better support for LAST output (in any form) especially for non-coding RNA datasets.

/Daniel

Dear Daniel, thanks for sharing your insights!

Here is a comparison of Last vs DIAMOND on a 32-core server with 512GB of memory. I ran DIAMOND with the option -b 20, which increases the “block size” used by DIAMOND. With this option, DIAMOND memory usage peaked at about 120 GB. The input file contains 60 million human gut reads (length 101bp) and the reference database is NCBI-nr. Here are the results:

a) Last, with tab output: 4.4 hours
Output file size: 185GB
50 million reads aligned, 1.8 billion alignments
Import this output into MEGAN: 31 hours
RMA file size: 81GB
Total: 35 hours

b) Last, with MAF output: 4.7 hours
Output file size: 400GB
50 million reads aligned, 1.8 billion alignments
Import this output into MEGAN: too long

c) Diamond, with tab output: 5 hours
Output file size: 58GB
45 million reads aligned, 683 million alignments
Import this output into MEGAN: 22 hours
RMA file size: 35GB
Total: 27 hours

d) Diamond, with DAA output: 5 hours
Output file size: 15GB
45 million reads aligned, 683 million alignments
Meganize this output: 68 minutes
File size after meganization: 17GB
Total: just over 7 hours

This comparison is not fair to Last because Last produces three times as many alignments.
I will look into setting options for both programs so that they target a similar set of alignments.

Currently, the fastest way to produce a file that can be opened in MEGAN is (d).

Diamond is more sensitive than Last at least according to a standard SCOP-based ROC analysis. The performance for short read alignment is currently about 2x faster using the latest version.


That it a huge improvement.

We have recently added a new “long read” mode to MEGAN to handle PacBio and Nanopore reads, and contigs.
For such data, at present, LAST performs better than DIAMOND, because LAST performs a frame-shift aware alignment. Both long read technologies appear to produce many sequencing-error-indels and this causes frame-shifts to break DNA-to-protein alignments that have to maintain frame.

I plan to add support for frameshift alignments.

The difference between the programs is even more clear in sensitive mode. Diamond-sensitive runs almost 100x faster than last -m10000 while still being more sensitive.

The benchmark is not perfectly fair because the timing measurement was done separately on short reads while the SCOP analysis is a full length protein benchmark of course but the gap between the two programs is still very evident.

@buchfink can you post your DIAMOND and LAST run parameters? We’re interested in how a typical DIAMOND like these runs are performed, particularly in a HPC environment like a typical cluster. (where you could write to a NFS, local disk, or shared memory). I’m also interested in seeing how these compare across different file systems at some point; we’ve seen about 2x variation in speed with different NFS (GPFS vs Ceph, with Ceph being faster).

For the timing benchmark Diamond was run with -b10 -c1 -f100, also the output was put on /dev/shm. Last was run with -P 32 -f blasttab. For the sensitive benchmark Diamond was run with --sensitive and Last with -m 10000.