Hi, I have used malt and maltExtract to map some samples against the full ncbi fungi database as part of the HOPS pipeline. The problem I have is that I’m getting two instances of duplicate rownames in my maltExtract outputs which means I can’t run the HOPS post-processing step. The first is with Candida, and the second Cryptococcus. Since I am most interested in the latter, I was thinking I could just remove all lines corresponding to Candida in the output files. However, I do want to pull out the mapping stats to Cryptococcus. One of the duplicated lines doesn’t have any reads mapped to it, so I assume it should be fine to remove from RunSummary.txt, but I don’t know what to remove from the damageMismatch.txt file as shown below.
Perhaps there is a way to fix it in either my original database or the rma6 files before maltExtract is used? I did try removing certain fna files from my database before running malt-build already but I couldn’t get the problem to resolve.
Any help appreciated!
candida duplication in RunSummary.txt:
(hops2) [he11@login-a default]$ grep "Candida" RunSummary.txt
Candida 7 1 5 2 9 4 4
Candida 702 715 345 4388 526 964 875
Candida_albicans 76 39 23 7 56 44 45
Candida_albicans_A155 0 0 0 0 0 0 0
Candida_albicans_A67 0 0 0 0 0 0 1
Candida_albicans_CHN1 1 0 1 0 0 1 0
Candida_albicans_Ca529L 2 2 0 0 0 0 1
Candida_albicans_Ca6 1 0 0 0 1 1 1
Candida_albicans_P34048 0 1 0 0 0 0 0
Candida_albicans_P37037 1 0 0 0 0 0 0
Candida_albicans_P37039 1 0 0 0 0 0 0
Candida_albicans_P57072 0 0 0 0 0 1 0
Candida_albicans_P60002 1 0 0 0 0 0 0
Candida_albicans_P75010 1 0 0 0 0 0 0
Candida_albicans_P75016 0 0 0 0 0 1 0
Candida_albicans_P94015 0 0 0 0 0 0 1
Candida_albicans_SC5314 1 0 0 0 2 0 0
Candida_albicans_WO-1 0 0 0 0 1 0 0
Candida_corydali 1 2 2 0 1 0 2
Candida_dubliniensis 39 20 8 2 20 21 8
Candida_dubliniensis_CD36 4 1 2 0 3 2 1
Candida_maltosa_Xu316 12 3 4 0 5 9 5
Candida_metapsilosis 9 6 2 3 5 5 11
Candida_orthopsilosis 1 9 1 0 0 0 2
Candida_orthopsilosis_AY2 4 4 0 0 3 0 1
Candida_orthopsilosis_Co_90-125 5 2 1 0 4 3 4
Candida_orthopsilosis_MCO456 49 13 6 7 27 16 5
Candida_oxycetoniae 21 2 9 2 12 11 12
Candida_parapsilosis 56 25 20 85 41 25 72
Candida_parapsilosis_GA1 7 2 0 1 5 1 0
Candida_sanyaensis 21 9 4 1 12 14 5
Candida_sojae 9 0 6 4 4 5 1
Candida_sp._JCM_15000 0 4 1 0 1 1 1
Candida_sp._LDI48194 2 1 1 0 0 0 1
Candida_theae 8 1 3 0 3 4 4
Candida_tropicalis 38 35 14 9 32 58 16
Candida_tropicalis_MYA-3404 2 0 1 0 0 0 0
Candida_viswanathii 0 2 1 0 4 1 3
Cryptococcus duplication in RunSummary.txt:
(hops2) [he11@login-a default]$ grep "Cryptococcus" RunSummary.txt
Cryptococcus 0 0 0 0 0 0 0
Cryptococcus 1393 4022 204 3239 892 3523 1144
Cryptococcus_amylolentus 0 2 0 0 0 2 3
Cryptococcus_amylolentus_CBS_6039 2 1 1 0 0 3 1
Cryptococcus_amylolentus_CBS_6273 2 0 0 0 2 2 1
Cryptococcus_depauperatus 0 0 0 0 0 2 1
Cryptococcus_depauperatus_CBS_7841 0 0 2 0 0 0 2
Cryptococcus_depauperatus_CBS_7855 0 1 1 1 0 0 1
Cryptococcus_gattii_CA1280 2 0 0 0 0 0 0
Cryptococcus_gattii_CA1873 0 0 0 0 0 2 0
Cryptococcus_gattii_EJB2 0 0 0 0 0 0 1
Cryptococcus_gattii_NT-10 2 1 1 0 0 2 2
Cryptococcus_gattii_Ru294 0 0 0 0 0 2 1
Cryptococcus_gattii_VGI 2 2 0 0 4 7 1
Cryptococcus_gattii_VGII 4 0 1 0 0 1 1
Cryptococcus_gattii_WM276 1 2 0 0 1 0 0
Cryptococcus_gattii_species_complex 0 0 0 0 1 1 0
Cryptococcus_neoformans 13 19 7 0 6 24 9
Cryptococcus_neoformans_AD_hybrid 0 1 0 0 0 2 1
Cryptococcus_neoformans_species_complex 0 1 2 0 1 0 1
Cryptococcus_neoformans_var._grubii 32 27 12 6 23 32 14
Cryptococcus_neoformans_var._grubii_125.91 1 0 0 0 0 0 0
Cryptococcus_neoformans_var._grubii_A5-35-17 0 0 0 0 1 0 0
Cryptococcus_neoformans_var._grubii_Br795 12 4 2 1 5 4 2
Cryptococcus_neoformans_var._grubii_Bt1 0 1 0 0 0 0 0
Cryptococcus_neoformans_var._grubii_Bt15 0 0 0 0 0 0 0
Cryptococcus_neoformans_var._grubii_Bt63 0 1 0 0 0 0 0
Cryptococcus_neoformans_var._grubii_C23 0 0 0 0 0 1 0
Cryptococcus_neoformans_var._grubii_CHC193 1 2 1 0 1 2 0
Cryptococcus_neoformans_var._grubii_D17-1 3 2 1 0 3 2 2
Cryptococcus_neoformans_var._grubii_MW-RSA1955 2 0 1 0 0 0 0
Cryptococcus_neoformans_var._grubii_MW-RSA36 0 0 0 0 0 0 0
Cryptococcus_neoformans_var._grubii_MW-RSA852 0 0 1 0 0 0 0
Cryptococcus_neoformans_var._grubii_Tu401-1 0 1 0 0 0 0 1
Cryptococcus_neoformans_var._grubii_c45 1 1 0 0 0 1 0
Cryptococcus_neoformans_var._neoformans 0 0 0 0 0 0 2
Cryptococcus_neoformans_var._neoformans_B-3501A 0 1 0 0 1 0 0
Cryptococcus_neoformans_var._neoformans_JEC21 0 0 0 0 2 0 1
Cryptococcus_neoformans_var._neoformans_XL280 1 0 0 0 0 0 0
Cryptococcus_sp._05/00 68 189 19 0 68 206 12
Cryptococcus_sp._JCM_24511 12 34 5 1 13 32 4
Cryptococcus_wingfieldii 1 2 1 0 2 9 0
Cryptococcus_wingfieldii_CBS_7118 2 0 1 0 1 2 1
And Cryptococcus in readDist/_alignment:
(hops2) [he11@login-a default]$ grep "Cryptococcus" readDist/S1.rma6_alignmentDist.txt
Cryptococcaceae Cryptococcus 1 2 2 2 2430807
Cryptococcus Cryptococcus 0.972 270 285 319 2099666
Cryptococcus NA 0 0 0 0 0
Cryptococcus_amylolentus Cryptococcus_amylolentus_CBS_6273 1 1 1 1 1425963
Cryptococcus_amylolentus_CBS_6039 Cryptococcus_floricola 1 1 1 1 92997
Cryptococcus_amylolentus_CBS_6273 Cryptococcus_amylolentus_CBS_6273 1 1 1 1 1093067
Cryptococcus_depauperatus Cryptococcus_depauperatus_CBS_7855 1 2 2 2 939986
Cryptococcus_gattii_CA1873 Purpureocillium_takamizusanense 1 1 1 1 1096568
Cryptococcus_gattii_NT-10 Cryptococcus_gattii_NT-10 1 2 2 2 4847
Cryptococcus_gattii_Ru294 Cryptococcus_gattii_Ru294 1 1 1 1 1133708
Cryptococcus_gattii_VGI Cryptococcus_gattii_EJB2 1 1 1 3 346420
Cryptococcus_gattii_VGII Cryptococcus_gattii_VGII 0 0 2 2 1116177
Cryptococcus_gattii_WM276 Cryptococcus_gattii_WM276 0 0 2 2 1325755
Cryptococcus_gattii_species_complex Cryptococcus_gattii_Ru294 1 1 1 1 536216
Cryptococcus_neoformans Cryptococcus_neoformans_var._grubii_H99 0.604 6 10 10 1621675
Cryptococcus_neoformans_AD_hybrid Cryptococcus_neoformans_AD_hybrid 1 1 1 1 84756
Cryptococcus_neoformans_species_complex NA 0 0 0 0 0
Cryptococcus_neoformans_var._grubii Cryptococcus_neoformans_var._grubii_A1-35-8 0.487 7 16 17 174521
Cryptococcus_neoformans_var._grubii_Br795 NA 0.035 0 4 4 1344473
Cryptococcus_neoformans_var._grubii_Bt15 Cryptococcus_neoformans_var._grubii_Bt15 0.043 0 2 2 3096
Cryptococcus_neoformans_var._grubii_C23 Cryptococcus_neoformans_var._grubii_C23 1 1 1 1 2917
Cryptococcus_neoformans_var._grubii_CHC193 Cryptococcus_neoformans_var._grubii_CHC193 1 1 1 1 3675
Cryptococcus_neoformans_var._grubii_D17-1 Cryptococcus_neoformans_var._grubii_D17-1 0.036 0 4 4 2878
Cryptococcus_neoformans_var._grubii_c45 Cryptococcus_neoformans_var._grubii_c45 0.083 0 2 2 153309
Cryptococcus_sp._05/00 Cryptococcus_sp._05/00 0.371 3 17 18 6454
Cryptococcus_sp._JCM_24511 Cryptococcus_sp._JCM_24511 1 6 6 8 2178591
Cryptococcus_wingfieldii Cryptococcus_wingfieldii 1 2 2 2 1480951
Cryptococcus_wingfieldii_CBS_7118 Cryptococcus_wingfieldii_CBS_7118 1 1 1 1 606306