Skip to content

Commit 09d605a

Browse files
committed
DeepSomatic 1.10.0 release
1 parent 5823b2d commit 09d605a

16 files changed

+232
-439
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# DeepSomatic
22

3-
[![release](https://img.shields.io/badge/release-v1.9.0-green?logo=github)](https://github.com/google/deepvariant/releases)
3+
[![release](https://img.shields.io/badge/release-v1.10.0-green?logo=github)](https://github.com/google/deepvariant/releases)
44
[![announcements](https://img.shields.io/badge/announcements-blue)](https://groups.google.com/d/forum/deepvariant-announcements)
55
[![blog](https://img.shields.io/badge/blog-orange)](https://goo.gl/deepvariant)
66

@@ -21,19 +21,19 @@ end-to-end testing and feature development of DeepVariant.
2121

2222
Here are the scripts that describe the core components of DeepSomatic:
2323

24-
* [run_deepsomatic](https://github.com/google/deepvariant/blob/r1.9/scripts/run_deepsomatic.py):
24+
* [run_deepsomatic](https://github.com/google/deepvariant/blob/r1.10/scripts/run_deepsomatic.py):
2525
The DeepSomatic runner script.
2626

27-
* [make_examples_somatic](https://github.com/google/deepvariant/blob/r1.9/deepvariant/make_examples_somatic.py):
27+
* [make_examples_somatic](https://github.com/google/deepvariant/blob/r1.10/deepvariant/make_examples_somatic.py):
2828
The `make_examples` step for DeepSomatic.
2929

30-
* [call_variants](https://github.com/google/deepvariant/blob/r1.9/deepvariant/call_variants.py):
30+
* [call_variants](https://github.com/google/deepvariant/blob/r1.10/deepvariant/call_variants.py):
3131
Inference script that generates the variant calls.
3232

33-
* [postprocess_variants](https://github.com/google/deepvariant/blob/r1.9/deepvariant/postprocess_variants.py):
33+
* [postprocess_variants](https://github.com/google/deepvariant/blob/r1.10/deepvariant/postprocess_variants.py):
3434
Updated with `process_somatic` option to process somatic variants.
3535

36-
* [dockerfile](https://github.com/google/deepvariant/blob/r1.9/Dockerfile.deepsomatic):
36+
* [dockerfile](https://github.com/google/deepvariant/blob/r1.10/Dockerfile.deepsomatic):
3737
The Dockerfile for DeepSomatic.
3838

3939
Integrating DeepSomatic within DeepVariant helps to maintain

docs/deepsomatic-case-study-ffpe-wes-tumor-only.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6969
### Running on a CPU-only machine
7070

7171
```bash
72-
BIN_VERSION="1.9.0"
72+
BIN_VERSION="1.10.0"
7373

7474
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
7575

@@ -124,7 +124,7 @@ The output:
124124

125125
```
126126
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
127-
0 indels 7 13 6 7 1 0 0 0.857143 0.499203 1.000000 0.857143 0.461538 0.221123 0.717108 0 0 248956422 0.028117
127+
0 indels 7 12 5 7 2 0 0 0.714286 0.352338 0.935272 0.714286 0.416667 0.180479 0.688060 0 0 248956422 0.028117
128128
1 SNVs 145 129 99 30 46 0 0 0.682759 0.603975 0.754328 0.682759 0.767442 0.689149 0.833890 0 0 248956422 0.120503
129-
5 records 152 142 105 37 47 0 0 0.690789 0.614250 0.760141 0.690789 0.739437 0.662930 0.806296 0 0 248956422 0.148620
129+
5 records 152 141 104 37 48 0 0 0.684211 0.607379 0.754102 0.684211 0.737589 0.660674 0.804865 0 0 248956422 0.148620
130130
```

docs/deepsomatic-case-study-ffpe-wes.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6363
### Running on a CPU-only machine
6464

6565
```bash
66-
BIN_VERSION="1.9.0"
66+
BIN_VERSION="1.10.0"
6767

6868
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
6969

@@ -115,6 +115,6 @@ The output:
115115
```
116116
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
117117
0 indels 7 10 7 3 0 0 0 1.000000 0.590384 1.000000 1.000000 0.700000 0.394182 0.907305 0 0 248956422 0.012050
118-
1 SNVs 145 124 121 3 24 0 0 0.834483 0.767678 0.888096 0.834483 0.975806 0.936851 0.993140 0 0 248956422 0.012050
119-
5 records 152 134 128 6 24 0 0 0.842105 0.777939 0.893394 0.842105 0.955224 0.910050 0.981097 0 0 248956422 0.024101
118+
1 SNVs 145 123 121 2 24 0 0 0.834483 0.767678 0.888096 0.834483 0.983740 0.948867 0.996606 0 0 248956422 0.008034
119+
5 records 152 133 128 5 24 0 0 0.842105 0.777939 0.893394 0.842105 0.962406 0.919578 0.985513 0 0 248956422 0.020084
120120
```

docs/deepsomatic-case-study-ffpe-wgs-tumor-only.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6868
### Running on a CPU-only machine
6969

7070
```bash
71-
BIN_VERSION="1.9.0"
71+
BIN_VERSION="1.10.0"
7272

7373
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
7474

docs/deepsomatic-case-study-ffpe-wgs.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6262
### Running on a CPU-only machine
6363

6464
```bash
65-
BIN_VERSION="1.9.0"
65+
BIN_VERSION="1.10.0"
6666

6767
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
6868

@@ -90,7 +90,8 @@ the different model types, different flags are needed in the `make_examples`
9090
step.
9191

9292
`--intermediate_results_dir` flag is optional. By specifying it, the
93-
intermediate outputs of `make_examples_somatic` and `call_variants` stages can be found in the directory.
93+
intermediate outputs of `make_examples_somatic` and `call_variants` stages can
94+
be found in the directory.
9495

9596
```bash
9697
sudo docker pull pkrusche/hap.py:latest
@@ -112,7 +113,7 @@ The output:
112113

113114
```
114115
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
115-
0 indels 133 138 107 31 26 0 0 0.804511 0.730987 0.864953 0.804511 0.775362 0.700491 0.838825 0 0 248956422 0.124520
116-
1 SNVs 3440 3015 2844 171 596 0 0 0.826744 0.813825 0.839114 0.826744 0.943284 0.934599 0.951118 0 0 248956422 0.686867
117-
5 records 3573 3153 2951 202 622 0 0 0.825917 0.813222 0.838083 0.825917 0.935934 0.926985 0.944084 0 0 248956422 0.811387
116+
0 indels 133 140 106 34 27 0 0 0.796992 0.722688 0.858537 0.796992 0.757143 0.681350 0.822430 0 0 248956422 0.136570
117+
1 SNVs 3440 3018 2847 171 593 0 0 0.827616 0.814721 0.839960 0.827616 0.943340 0.934664 0.951167 0 0 248956422 0.686867
118+
5 records 3573 3158 2953 205 620 0 0 0.826476 0.813797 0.838627 0.826476 0.935085 0.926092 0.943282 0 0 248956422 0.823437
118119
```

docs/deepsomatic-case-study-ont-tumor-only.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6868
### Running on a CPU-only machine
6969

7070
```bash
71-
BIN_VERSION="1.9.0"
71+
BIN_VERSION="1.10.0"
7272

7373
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
7474

@@ -122,7 +122,7 @@ The output:
122122

123123
```
124124
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
125-
0 indels 133 183 66 117 67 0 0 0.496241 0.412115 0.580528 0.496241 0.360656 0.293704 0.431976 0 0 248956422 0.469962
126-
1 SNVs 3440 4389 2623 1766 817 0 0 0.762500 0.748063 0.776496 0.762500 0.597630 0.583062 0.612070 0 0 248956422 7.093611
127-
5 records 3573 4572 2689 1883 884 0 0 0.752589 0.738239 0.766529 0.752589 0.588145 0.573827 0.602352 0 0 248956422 7.563573
125+
0 indels 133 190 59 131 74 0 0 0.443609 0.361145 0.528498 0.443609 0.310526 0.247974 0.378802 0 0 248956422 0.526197
126+
1 SNVs 3440 4369 2618 1751 822 0 0 0.761047 0.746580 0.775074 0.761047 0.599222 0.584629 0.613683 0 0 248956422 7.033359
127+
5 records 3573 4559 2677 1882 896 0 0 0.749230 0.734820 0.763237 0.749230 0.587190 0.572847 0.601423 0 0 248956422 7.559556
128128
```

docs/deepsomatic-case-study-ont.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6262
### Running on a CPU-only machine
6363

6464
```bash
65-
BIN_VERSION="1.9.0"
65+
BIN_VERSION="1.10.0"
6666

6767
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
6868

@@ -112,7 +112,7 @@ The output:
112112

113113
```
114114
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
115-
0 indels 133 110 86 24 47 0 0 0.646617 0.562921 0.724009 0.646617 0.781818 0.697965 0.851069 0 0 248956422 0.096402
116-
1 SNVs 3440 2655 2611 44 829 0 0 0.759012 0.744506 0.773082 0.759012 0.983427 0.978031 0.987772 0 0 248956422 0.176738
117-
5 records 3573 2765 2697 68 876 0 0 0.754828 0.740520 0.768723 0.754828 0.975407 0.969127 0.980693 0 0 248956422 0.273140
115+
0 indels 133 103 87 16 46 0 0 0.654135 0.570673 0.730971 0.654135 0.844660 0.765604 0.904679 0 0 248956422 0.064268
116+
1 SNVs 3440 2663 2617 46 823 0 0 0.760756 0.746284 0.774789 0.760756 0.982726 0.977240 0.987165 0 0 248956422 0.184771
117+
5 records 3573 2766 2704 62 869 0 0 0.756787 0.742516 0.770643 0.756787 0.977585 0.971558 0.982614 0 0 248956422 0.249040
118118
```

docs/deepsomatic-case-study-pacbio-tumor-only.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6868
### Running on a CPU-only machine
6969

7070
```bash
71-
BIN_VERSION="1.9.0"
71+
BIN_VERSION="1.10.0"
7272

7373
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
7474

@@ -122,7 +122,7 @@ The output:
122122

123123
```
124124
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
125-
0 indels 133 185 84 101 49 0 0 0.631579 0.547480 0.710021 0.631579 0.454054 0.383486 0.526047 0 0 248956422 0.405693
126-
1 SNVs 3440 4816 3212 1604 228 0 0 0.933721 0.925041 0.941671 0.933721 0.666944 0.653535 0.680151 0 0 248956422 6.442895
127-
5 records 3573 5001 3296 1705 277 0 0 0.922474 0.913362 0.930902 0.922474 0.659068 0.645841 0.672111 0 0 248956422 6.848588
125+
0 indels 133 176 80 96 53 0 0 0.601504 0.516849 0.681794 0.601504 0.454545 0.382223 0.528349 0 0 248956422 0.385610
126+
1 SNVs 3440 4737 3169 1568 271 0 0 0.921221 0.911863 0.929870 0.921221 0.668989 0.655488 0.682283 0 0 248956422 6.298291
127+
5 records 3573 4913 3249 1664 324 0 0 0.909320 0.899573 0.918404 0.909320 0.661307 0.647981 0.674442 0 0 248956422 6.683901
128128
```

docs/deepsomatic-case-study-pacbio.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variant
6262
### Running on a CPU-only machine
6363

6464
```bash
65-
BIN_VERSION="1.9.0"
65+
BIN_VERSION="1.10.0"
6666

6767
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
6868

@@ -112,7 +112,7 @@ The output:
112112

113113
```
114114
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
115-
0 indels 133 150 114 36 19 0 0 0.857143 0.790241 0.908710 0.857143 0.76000 0.687131 0.822947 0 0 248956422 0.144604
116-
1 SNVs 3440 3349 3228 121 212 0 0 0.938372 0.929965 0.946042 0.938372 0.96387 0.957144 0.969795 0 0 248956422 0.486029
117-
5 records 3573 3499 3342 157 231 0 0 0.935348 0.926931 0.943061 0.935348 0.95513 0.947891 0.961617 0 0 248956422 0.630632
115+
0 indels 133 143 113 30 20 0 0 0.849624 0.781637 0.902599 0.849624 0.790210 0.718079 0.850732 0 0 248956422 0.120503
116+
1 SNVs 3440 3328 3211 117 229 0 0 0.933430 0.924734 0.941398 0.933430 0.964844 0.958177 0.970703 0 0 248956422 0.469962
117+
5 records 3573 3471 3324 147 249 0 0 0.930311 0.921612 0.938313 0.930311 0.957649 0.950564 0.963972 0 0 248956422 0.590465
118118
```
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# DeepSomatic WES tumor-only case study
2+
3+
In this case study, we show an example of running DeepSomatic on WES
4+
on tumor-only data. We use HCC1395 as an example for this case study.
5+
6+
## Data details
7+
8+
For this case-study, we use HCC1395 as an example. We run the analysis on `chr1`
9+
that we hold out during training.
10+
11+
Please see the [metrics page](metrics.md) for details on runtime and data.
12+
13+
## Prepare environment
14+
15+
### Tools
16+
17+
[Docker](https://docs.docker.com/get-docker/) will be used to run DeepSomatic
18+
and [hap.py](https://github.com/illumina/hap.py),
19+
20+
### Download input data
21+
22+
We will be using GRCh38 for this case study.
23+
24+
25+
```bash
26+
BASE="${HOME}/deepsomatic-wes-tumor-only-case-study"
27+
28+
# Set up input and output directory data
29+
INPUT_DIR="${BASE}/input/data"
30+
OUTPUT_DIR="${BASE}/output"
31+
32+
## Create local directory structure
33+
mkdir -p "${INPUT_DIR}"
34+
mkdir -p "${OUTPUT_DIR}"
35+
mkdir -p "${OUTPUT_DIR}/sompy_output"
36+
37+
# Download bam files to input directory
38+
HTTPDIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/deepsomatic-chr1-case-studies
39+
# Download the reference files
40+
curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna
41+
curl ${HTTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai > ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna.fai
42+
43+
# Download the bam files
44+
curl ${HTTPDIR}/HCC1395_wes.tumor.chr1.bam > ${INPUT_DIR}/HCC1395_wes.tumor.chr1.bam
45+
curl ${HTTPDIR}/HCC1395_wes.tumor.chr1.bam.bai > ${INPUT_DIR}/HCC1395_wes.tumor.chr1.bam.bai
46+
47+
# Download truth VCF
48+
DATA_HTTP_DIR=https://storage.googleapis.com/deepvariant/deepsomatic-case-studies/SEQC2-S1395-truth
49+
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/High-Confidence_Regions_v1.2.bed
50+
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz
51+
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz.tbi
52+
wget -P ${INPUT_DIR} "${DATA_HTTP_DIR}"/seqc2_hg38.exome_regions.bed
53+
```
54+
55+
## Running DeepSomatic with one command
56+
57+
DeepVariant pipeline consists of 3 steps: `make_examples_somatic`, `call_variants`, and
58+
`postprocess_variants`. You can run DeepSomatic with one command using the
59+
`run_deepvariant` script.
60+
61+
### Running on a CPU-only machine
62+
63+
```bash
64+
BIN_VERSION="1.10.0"
65+
66+
sudo docker pull google/deepsomatic:"${BIN_VERSION}"
67+
68+
sudo docker run \
69+
-v ${INPUT_DIR}:${INPUT_DIR} \
70+
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
71+
google/deepsomatic:"${BIN_VERSION}" \
72+
run_deepsomatic \
73+
--model_type=WES_TUMOR_ONLY \
74+
--ref=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \
75+
--reads_tumor=${INPUT_DIR}/HCC1395_wes.tumor.chr1.bam \
76+
--output_vcf=${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \
77+
--sample_name_tumor="HCC1395Tumor" \
78+
--num_shards=$(nproc) \
79+
--logging_dir=${OUTPUT_DIR}/logs \
80+
--intermediate_results_dir=${OUTPUT_DIR}/intermediate_results_dir \
81+
--use_default_pon_filtering=true \
82+
--regions=chr1
83+
```
84+
85+
NOTE: If you want to run each of the steps separately, add `--dry_run=true`
86+
to the command above to figure out what flags you need in each step. Based on
87+
the different model types, different flags are needed in the `make_examples`
88+
step.
89+
90+
`--intermediate_results_dir` flag is optional. By specifying it, the
91+
intermediate outputs of `make_examples_somatic` and `call_variants` stages can
92+
be found in the directory.
93+
94+
```bash
95+
sudo docker pull pkrusche/hap.py:latest
96+
# Run hap.py
97+
sudo docker run \
98+
-v ${INPUT_DIR}:${INPUT_DIR} -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
99+
pkrusche/hap.py:latest \
100+
/opt/hap.py/bin/som.py \
101+
-N ${INPUT_DIR}/high-confidence_sINDEL_sSNV_in_HC_regions_v1.2.1.merged.vcf.gz \
102+
${OUTPUT_DIR}/HCC1395_deepsomatic_output.vcf.gz \
103+
-r ${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.chr1.fna \
104+
-o ${OUTPUT_DIR}/sompy_output/deepsomatic.chr1.sompy.output \
105+
--feature-table generic \
106+
-R ${INPUT_DIR}/High-Confidence_Regions_v1.2.bed \
107+
-T ${INPUT_DIR}/seqc2_hg38.exome_regions.bed \
108+
-l chr1
109+
```
110+
111+
The output:
112+
113+
```
114+
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
115+
0 indels 7 5 4 1 3 0 0 0.571429 0.234501 0.861136 0.571429 0.800000 0.371374 1.000000 0 0 248956422 0.004017
116+
1 SNVs 145 56 47 9 98 0 0 0.324138 0.252009 0.403209 0.324138 0.839286 0.727202 0.917389 0 0 248956422 0.036151
117+
5 records 152 61 51 10 101 0 0 0.335526 0.264116 0.413134 0.335526 0.836066 0.728674 0.912475 0 0 248956422 0.040168
118+
```

0 commit comments

Comments
 (0)