add waveform duration estimation to inference notebook 4

Collin Capano · Collin Capano · commit 07edaf7f4fed · 2020-06-24T15:41:21.000Z
diff --git a/tutorial/inference_4_bbh_example/IntroToPyCBCInference.ipynb b/tutorial/inference_4_bbh_example/IntroToPyCBCInference.ipynb
@@ -179,7 +179,7 @@
     "The number of effective samples are counted at each checkpoint. For this reason, a checkpoint-interval must be provided if `effective-nsamples` is set.\n",
     "\n",
     "#### Max samples per chain\n",
-    "If `max-samples-per-chain` is provided, `pycbc_inference` will ensure that no more than the given number of samples per chain are stored in the output file. Samples will be thinned on disk and in memory when a checkpoint happens to ensure this. This is important for keeping file size down. Without it, a GW run with `1000` walkers and `4` temps can result in a file that is over 100GB, since every sample will be saved. With `max-samples-per-chain = 1000`, the maximum file size is capped to ~1GB."
+    "If `max-samples-per-chain` is provided, `pycbc_inference` will ensure that no more than the given number of samples per chain are stored in the output file. Samples will be thinned on disk and in memory when a checkpoint happens to ensure this. This is important for keeping file size down. Without it, a GW run with `200` walkers and `20` temps can result in a file that is over 100GB, since every sample will be saved. With `max-samples-per-chain = 1000`, the maximum file size is capped to ~1GB."
    ]
   },
   {
@@ -198,7 +198,7 @@
     "\n",
     "The settings for loading the data are in [data.ini](data.ini). This contains a `[data]` section, which is read by the GaussianNoise model to figure out what data to load, and how to condition. Here is what each of the settings that we have does:\n",
     " * `instruments`: This tells the code what detectors to analyze. Here, we've set it to `H1` and `L1`.\n",
-    " * `trigger-time`, `analysis-start-time` and `analysis-end-time`: The `analysis-(start|end)-time` options determine the time that will be analyzed. Notice that the start-time is `-6` and the end time is `2`. This is because these times are measured with respect to the `trigger-time` option. Here, we put an estimate of the GPS time when the binary black hole merger occurred (in a Geocentric reference frame). With these settings, our analyzed time will start 6 seconds before the merger time and end 2 seconds after.\n",
+    " * `trigger-time`, `analysis-start-time` and `analysis-end-time`: The `analysis-(start|end)-time` options determine the time that will be analyzed. Notice that the start-time is `-8` and the end time is `2`. This is because these times are measured with respect to the `trigger-time` option. Here, we put an estimate of the GPS time when the binary black hole merger occurred (in a Geocentric reference frame). With these settings, our analyzed time will start 8 seconds before the merger time and end 2 seconds after.\n",
     " * `psd-estimation`: This determines how we will estimate the PSD. By setting it to `median-mean`, the PSD will be analyzed from the data using a Welch-like method. Basically, the data is chopped up into semi-overlapping segments, an FFT is taken in each block, then the median is taken over all odd-numbered segments. The same process is repeated for the even-numbered segments. The two sets are then averaged to give the PSD.\n",
     " * `psd-start-time` and `psd-end-time`: This defines the analysis block that is used for estimating the PSD. To get a good estimate, you generally want to use ~512s of data. Here, we use 512s centered on the trigger time.\n",
     " * `psd-segment-length` and `psd-segment-stride`: These determine the size of each segment when doing the median-mean method, and how much each segment overlaps.\n",
@@ -207,10 +207,46 @@
     " * `channel-name`: The name of the channels in the frame files containing the gravitational-wave data to analyze.\n",
     " * `sample-rate`: The sample rate we will use for the analysis. You want this to be atleast twice the maximum frequency of any possible waveform that will be generated by your prior. For BBH, 2048Hz is generally ok.\n",
     " * `strain-high-pass`: Causes a high-pass filter to be applied to the data when it is first loaded, with the cutoff frequency (here) set to 15Hz. This is just to remove the large amplitude low-frequency noise, so as not to cause numerical overflow issues when calculating the likelihood. Generally, you want this to be a few Hz lower than the low-frequency-cutoff used in the model.\n",
-    " * `pad-data`: Adds an extra few seconds on to the data when loading. This is to avoid corruption issues from the `strain-high-pass` filter. The padded data are removed after the high-pass filter is applied, and before any FFTs are done.\n",
+    " * `pad-data`: Adds an extra few seconds on to the data when loading. This is to avoid corruption issues from the `strain-high-pass` filter. The padded data are removed after the high-pass filter is applied, and before any FFTs are done."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Determining analysis time duration\n",
+    "\n",
+    "Why did we use -8 and +2 for the analysis times? The discrete inner product treats the data as if it were cyclic. If we try to filter a model waveform that is longer (starting from the low-frequency-cutoff) than the analysis duration, it will wrap around to the beginning. For example, if we used an analysis time of 4s, but a waveform is 5s long, the last second of the waveform will wrap around to lay on top of the first second of the segment. To avoid this, we need to analyze a segment that is longer than the longest possible waveform admitted by our prior plus our uncertainty in the trigger time ($\\pm 0.1\\,$s). In this case, our longest waveform is $m_1 = m_2 = 10\\,\\mathrm{M}_\\odot$. We can check the duration of this waveform using `get_waveform_filter_length_in_time`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pycbc import waveform"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "waveform.get_waveform_filter_length_in_time(approximant='IMRPhenomPv2', mass1=10., mass2=10.,\n",
+    "                                            spin1z=0.99, spin2z=0.99, f_lower=20.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since the `trigger-time` is near the merger time, using ~8s before and ~2s after sufficiently encompasses the longest waveform we might sample.\n",
     "\n",
     "### Challenge:\n",
-    " * Why did we use -6 and +2 for the analysis times for this analysis? When would we change these values, and why?"
+    "\n",
+    "What analysis-time settings would you use if your prior on mass1 and mass2 was uniform in $[5, 40)\\,\\mathrm{M}_\\odot$, and you were starting from 20Hz? What if you start from 15Hz?"
    ]
   },
   {
diff --git a/tutorial/inference_4_bbh_example/data.ini b/tutorial/inference_4_bbh_example/data.ini
@@ -4,7 +4,7 @@ trigger-time = 1126259462.43
 ; See the documentation at
 ; http://pycbc.org/pycbc/latest/html/inference.html#simulated-bbh-example
 ; for details on the following settings:
-analysis-start-time = -6
+analysis-start-time = -8
 analysis-end-time = 2
 psd-estimation = median-mean
 psd-start-time = -256