Total meganized reads imported not matching total sequence reads

Hi Everyone,

I am new to MEGAN, so I might be missing something. When I meganize reads, the total number of reads the program recognizes is less than half of the number in the fasta file. I am using the 6.21.5 version of MEGAN and trying to view BLASTn assignments of a fasta file from ONT reads using the GUI interface.

When I import the files MEGAN does not recognize all sequences - it reports total reads as 206, which is less than half of what should be there. When I grep the BLAST output file and the FASTA file that are used as input for MEGAN, they both return 456 reads. Even when I subtract unassigned reads from the BLAST file, this does not get me down to 206.

It’s weird because it doesn’t throw an error or even acknowledge that these reads are going missing, so I’m not sure if I’m missing a step, but even exporting the assignments from MEGAN shows only 206 reads.

Any ideas?
Thanks!

Output from MEGAN -
Executing: ‘import’‘blastFile’’=’’/Volumes/MyPassport/MinION/4-22-21_18SeDNA/PipelineOutput/5_BLAST/BARCODE17_medakapolishBLAST_remotent.txt’‘fastaFile’’=’’/Volumes/MyPassport/MinION/4-22-21_18SeDNA/PipelineOutput/4_medaka/BARCODE17/consensus.fasta’‘meganFile’’=’’/Volumes/MyPassport/MinION/4-22-21_18SeDNA/MEGAN/071020M_medakapolishBLAST_remotent.rma6’‘useCompression’’=’‘true’‘format’’=’‘BlastTab’‘mode’’=’‘BlastN’‘maxMatches’’=’‘100’‘minScore’’=’‘1.0’‘maxExpected’’=’‘0.01’‘minPercentIdentity’’=’‘0.0’‘topPercent’’=’‘10.0’‘minSupportPercent’’=’‘0.05’‘lcaAlgorithm’’=’‘naive’‘minPercentReadToCover’’=’‘0.0’‘minPercentReferenceToCover’’=’‘0.0’‘minReadLength’’=’‘0’‘useIdentityFilter’’=’‘false’‘readAssignmentMode’’=’‘readCount’‘fNames’’=’;
Classifications: Taxonomy
Annotating RMA6 file using EXTENDED mode
Parsing file: /Volumes/MyPassport/MinION/4-22-21_18SeDNA/PipelineOutput/5_BLAST/BARCODE17_medakapolishBLAST_remotent.txt
Total reads: 206
Alignments: 1,112
Initializing binning…
Using ‘Naive LCA’ algorithm for binning: Taxonomy
Binning reads…
Total reads: 206
With hits: 206
Alignments: 1,112
Assig. Taxonomy: 205
MinSupport set to: 1
Min-supp. changes: 1
Numb. Tax. classes: 35
Class. Taxonomy: 35

MEGAN thinks there are only 206 sequences in the input… If you can give me access to the two files (alignments and reads) then I can look into this.

This appears to be long-read data… If so, please select the “long read” option when importing such data, as the default algorithms are designed for short reads and won’t work well on long reads.

Hi Daniel,

I just sent the files to your university email.

I did try both the long read option and the naive LCA algorithm and both ways MEGAN thinks there are only 206 sequences. The reads are longer, but they should only cover one gene.

Thanks,
Laura

Dear Laura,

thank you for the files. I count 209 sequences in either file:

grep -c “>” *fasta
209

grep -c Query *txt
209

Of these, 3 do not have any hits:

grep -c " 0 hits found" *txt
3

MEGAN imports 206. I don’t see why you are hoping to get 456 reads…

Hi Daniel,
That is really strange, when I grepped them they both said 456. I will check on another computer to see whether that changes anything.

Thanks,

The reads file only has 418 lines, so it definitely can’t contain 456 reads…

wc -l *fasta
418 Barcode17ConsensusSeqs.fasta

Yeah, I see what the issue is now, absolutely user error. The files I was grepping and the ones I input were different groups, I don’t know how I missed that before. Thank you again!