Figure 1: (A) Paired-end Read1 and Read2 are mapped distantly due to a translocation event in the sample genome. (B) Paired-end Read3 and Read4 are mapped with same orientation due to an inversion event in the sample genome
Dataset |
# of short reads |
read length |
SNV |
indels |
TNL |
INV |
Sim_Ecoli |
1,472,676 |
100 |
12,822 |
1,035 |
6 |
3 |
Sim_Chr1 |
78,859,696 |
100 |
627,576 |
52,067 |
261 |
211 |
Sim_HG |
978,158,892 |
100 |
8,003,103 |
666,213 |
2,650 |
2,559 |
SRR6062143 |
1,558,904,052 |
101 |
3,084,732 |
534,739 |
NA |
NA |
SRR7781445 |
609,362,230 |
151 |
3,084,732 |
534,739 |
NA |
NA |
SRR3440404 |
487,042,582 |
250 |
3,076,552 |
519,569 |
NA |
NA |
SRR6691661 |
638,722,972 |
151 |
3,076,552 |
519,569 |
NA |
NA |
Table 1: The benchmark datasets. Sim_Ecoli, Sim_Chr1, and Sim_HG are synthetic datasets. SRR6062143 and SRR7781445 are generated from NA12878 (HG001); SRR3440404 and SRR6691661 are generated from NA24385 (HG002). TNL: translocation; INV: inversion
Pipeline |
SNV |
INDEL |
Runtime (minutes) |
||
|
Precision |
Recall |
Precision |
Recall |
|
Sim_Ecoli |
|
|
|
|
|
MapCaller |
100 |
99.3 |
99.9 |
98.9 |
0.1 |
BWA + Freebayes |
99.6 |
99.5 |
100 |
98.7 |
9.6 |
BWA + Mpileup |
100 |
99.2 |
100 |
98.7 |
3.5 |
BWA + GATK |
99.3 |
76.4 |
100 |
15.9 |
66.4 |
Sim_Chr1 |
|
|
|
|
|
MapCaller |
99.9 |
98.5 |
99.8 |
98.0 |
8.3 |
BWA + Freebayes |
99.3 |
99.0 |
99.9 |
98.3 |
611.5 |
BWA + Mpileup |
100 |
98.6 |
99.9 |
97.9 |
277.7 |
BWA + GATK |
99.5 |
78.4 |
99.6 |
17.1 |
3264.2 |
Sim_HG |
|
|
|
|
|
MapCaller |
99.2 |
96.9 |
99.1 |
96.3 |
146.1 |
BWA + Freebayes |
99.2 |
92.3 |
99.9 |
91.6 |
6954.7 |
BWA + Mpileup |
44.7 |
91.8 |
99.9 |
91.0 |
3414.1 |
BWA + GATK |
99.4 |
71.7 |
99.8 |
14.9 |
36812.3 |
Table 2: The performance comparison on synthetic datasets. GATK producing low recall is due to high sequencing error rate (2%) and low sequencing coverage (30X). We investigate the effect of sequencing error rate and coverage in the Supplementary material
Pipeline |
SNV |
INDEL |
Runtime (hours) |
||
|
Precision |
Recall |
Precision |
Recall |
|
SRR6062143 |
|
|
|
|
|
MapCaller |
76.2 |
98.8 |
66.4 |
89.2 |
2.0 |
BWA + Freebayes |
70.8 |
96.2 |
61.3 |
91.8 |
122.7 |
BWA + Mpileup |
22.3 |
97.4 |
70.1 |
73.5 |
70.2 |
BWA + GATK |
77.1 |
97.3 |
67.1 |
95.5 |
203.8 |
SRR7781445 |
|
|
|
|
|
MapCaller |
78.6 |
97.3 |
66.2 |
88.3 |
1.5 |
BWA + Freebayes |
72.1 |
96.1 |
58.3 |
95.0 |
122.3 |
BWA + Mpileup |
17.8 |
97.4 |
68.0 |
71.4 |
71.8 |
BWA + GATK |
77.3 |
97.2 |
63.9 |
96.0 |
161.8 |
SRR3440404 |
|
|
|
|
|
MapCaller |
76.4 |
98.3 |
66.0 |
89.8 |
2.0 |
BWA + Freebayes |
67.3 |
95.8 |
60.1 |
95.5 |
116.4 |
BWA + Mpileup |
8.5 |
99.7 |
66.8 |
73.1 |
60.9 |
BWA + GATK |
77.1 |
99.7 |
68.7 |
95.7 |
227.9 |
SRR6691661 |
|
|
|
|
|
MapCaller |
77.5 |
97.9 |
60.7 |
91.3 |
1.5 |
BWA + Freebayes |
70.4 |
95.8 |
58.1 |
94.8 |
96.5 |
BWA + Mpileup |
16.8 |
99.4 |
66.6 |
76.2 |
49.7 |
BWA + GATK |
78.2 |
99.8 |
63.2 |
99.0 |
157.1 |
Table 3: The performance comparison on real datasets.
Data set |
Method |
Translocation |
Inversion |
||
|
|
Precision |
Recall |
Precision |
Recall |
Sim_Ecoli |
MapCaller |
100 |
100 |
100 |
100 |
|
LUMPY |
100 |
66.7 |
100 |
100 |
|
SVDetect |
100 |
100 |
100 |
100 |
|
DELLY |
NA |
NA |
100 |
100 |
Sim_Chr1 |
MapCaller |
100 |
94.3 |
100 |
87.7 |
|
LUMPY |
84.2 |
33.3 |
99.5 |
92.9 |
|
SVDetect |
27.2 |
97.7 |
64.8 |
93.8 |
|
DELLY |
NA |
NA |
96.9 |
95.7 |
Sim_HG |
MapCaller |
99.9 |
94.8 |
100 |
87.8 |
|
LUMPY |
87.9 |
35.3 |
100 |
93.7 |
|
SVDetect |
28.9 |
95.4 |
62.3 |
93.1 |
|
DELLY |
NA |
NA |
99.1 |
95.2 |
Table 4: The performance comparison on real datasets.
Figure 1: (A) Paired-end Read1 and Read2 are mapped distantly due to a translocation event in the sample genome. (B) Paired-end Read3 and Read4 are mapped with same orientation due to an inversion event in the sample genome
Figure 2: An example of position frequency matrix (PFM). Five read alignments are used to count the frequency of A, C, G and T at each position. Nucleobases in red are sequencing errors, and those in blue are SNPs
Figure 3: (A) Concordant pairs (B) discordant pairs −translocation (C) discordant pairs − inversion
Figures at a glance