Home / homework / HW 7 -- Continuous graphical models

HW 7 -- Continuous graphical models

The questions below are due on Monday November 06, 2023; 11:59:00 PM.
 
You are not logged in.

Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

This week's Colab notebook can be found here: Colab Notebook (Click Me!)

1) Time Series§

1.1) HMMs and Kalman Filters§

Indicate which of the following statements are true:
The forward-backward algorithm for HMMs is exactly the same algorithm as the Viterbi algorithm.
The forward-backward algorithm for HMMs is a special case of the sum-product algorithm.
The Viterbi algorithm for HMMs is a special case of the max-sum algorithm, except that in max-sum, the computation is with logs of the potentials instead of the potentials themselves.
Forward-backward is linear in the number of time steps.
Filtering to maintain a state estimate after each new observation requires us to store the history of observations.
Smoothing to update estimates of past states after each new observation requires us to store the history of observations.
Kalman filtering requires, for each time step, a number of multiplication operations that is roughly cubic in the number of state variables.
HMM filtering requires, for each time step, a number of multiplication operations that is roughly cubic in the number of state variables.
Kalman filtering can be used to determine the true most likely state estimate at time t when the state is continuous and the transition and observation dynamics are linear and Gaussian.
In Kalman filtering, if we only care about the maximum likelihood state estimate at time t, and not the covariance of our estimate, we don't need to compute the covariances at previous time steps.
HMMs are an exact solution for the most likely state estimate at time t when the state is continuous and the transition and observation dynamics are linear and Gaussian.
If we are using the HMM to make decisions at each timestep t, we only need to store the marginal over the state s_{t-1}, the transition potential from s_{t-1} to s_t and the measurement potential from s_t to z_t.

1.2) Properties of Viterbi§

Why do we need the backwards message passing in Viterbi decoding? During the forward pass, we are computing the maximum likelihood state s_t at each time step. Why don't we just remember what was the maximum likelihood state was at each time step and keep that as the sequence?

Consider the HMM below. Please give a sequence of of three observations where taking the maximum likelihood state at each step would give the wrong answer, and you need the backward messages. Your answer should be a python list of three True/False values.

Assume that the initial distribution is uniform.

Enter a python array of True and False values, e.g., [True, True, True].

2) Kalman Filtering§

Consider the following one-dimensional continuous-state system:

x_t = a x_{t-1} + u_{t-1} + w_t

y_t = h x_t + v_t

where w_t \sim N(0, \sigma_w^2) and v_t \sim N(0, \sigma^2_v).

Let us say that a = 2, h = 10, \sigma^2_w = 1 and \sigma^2_v = .25. Let us also say that the initial estimate of the state is

p(x_0) = N \left ( \begin{bmatrix} \mu_0 \\ \sigma^2 \end{bmatrix} \right ) = N(0, 1)

  1. Given a single action u_0 = 1, what is the mean of the posterior?

    Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).

  2. Given a single action u_0 = 1, what is the variance of the posterior?

    Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).

  3. After the action u_0 = 1, what is the expected measurement?

    Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).

  4. Given a subsequent measurement y_1 = 0, what is the variance of the posterior?

    Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).

  5. Given a subsequent measurement y_1 = 0, what is the mean of the posterior?

    Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).

  6. The posterior of the covariance will ...

    always get smaller over time
    always get bigger over time
    get bigger or smaller over time depending on the ratio of a to h
    get bigger or smaller over time depending on the ratio of \sigma_w to \sigma_v

3) Coding: Kalman Filter§

Let's implement the Kalman Filter for the one-dimensional continuous-state system from the previous section.

3.1) Kalman Filter Warmup§

Return an Gaussian object with mean=0, std=2. Please see the provided code for the definition of Gaussian.

For reference, our solution is 1 line(s) of code.

3.2) Kalman filter process model§

Implement the process step of a 1D Kalman filter, given a Gaussian prior distribution, a process model a, \sigma^2_w, and a control u.

For reference, our solution is 4 line(s) of code.

3.3) Kalman filter measurement model§

Implement the measurement step of a 1D Kalman filter, given a prior mean and variance, a measurement model h, \sigma^2_v, and a received measurement y.

For reference, our solution is 18 line(s) of code.

3.4) Kalman filter§

Combine the process and measurement steps into a single estimator that takes a prior, a process model, a measurement model, a list of controls, measurements, and returns a posterior.

For reference, our solution is 9 line(s) of code.

In addition to all the utilities defined at the top of the Colab notebook, the following functions are available in this question environment: kf_measurement, kf_process. You may not need to use all of them.

3.5) Kalman Filter Visualization§

We have created a visualization tool in Colab for the filtering process. You can run the code blocks and an animation of the filtered Gaussian in each step will be shown. There is also a plot showing the line chart of ground-truth locations, measurements, and posteriors.

4) Sampling§

4.1) Sampling types§

Indicate which of the following statements are true:
In the limit of infinite samples, the statistics we compute from a set of samples generated by Gibbs sampling, such as the mean of a variable, match the statistics of the underlying distribution.
It is possible to know ahead of time how many samples will be needed for importance sampling.
For some samplers, it is possible to know at execution time, by measuring the total weight of the samples collected so far, whether more samples are needed
When computing importance weights w(X) = P(X)/Q(X) for a sample X, it is important that P(X) be correctly normalized.
It is important that the potentials in the graphical model are correctly normalized for sampling to work.
We need to know ahead of time the statistics of the target distribution when designing the proposal distribution.
A bad proposal distribution means more samples to the proposal distribution will be needed to get a good estimate of the target distribution.
We don't need to worry about the direction of arrows when Gibbs sampling in a Bayes net.
The samples generated by Gibbs sampling are independent and identically distributed.

4.2) Sampling efficiency§

Kullback-Leibler divergence is an estimate of how similar two distributions are, and for discrete distributions it is given by

D_{KL}(P||Q) = \sum_{x\in {\mathcal {X}}} P(x) \log \left ( \frac{P(x)}{Q(x)} \right )

Let us assume that P is the target distribution, and Q is the proposal distribution, and we are performing rejection sampling.

What does this mean for our sampling strategy?

Select all of these that are true.
When Q(x) is small for x\in {\mathcal {X}} where P(x) is large, D_{KL}(P||Q) will always be large.
When Q(x) is large for x\in {\mathcal {X}} where P(x) is small, D_{KL}(P||Q) will always be large.
When D_{KL}(P||Q) is large, the sampling strategy will be very good. P and Q are closely aligned.
When D_{KL}(P||Q) is small, the sampling strategy will be very good no matter what. P and Q are closely aligned.
Our sampling strategy will be efficient when the KL divergence is small and P(x) isn't small or zero for x where Q(x) is large.
If Q has zero probability in places where P is non-zero, we will not get a representative set of samples.

5) Particle Filter§

Consider a domain in which the forward transition dynamics are “hybrid” in the sense that

P(X_t = x_t \mid X_{t-1} = x_{t-1}) = p * N(x_{t-1} + 1, 0.1)(x_t) + (1-p) * N(x_{t-1}-1, 0.1)(x_t)

that is, that the state will hop forward one unit in expectation with probability p, or backward one unit in expectation with probability 1-p, with variance 0.1 in each case.

Assume additionally that the observation model P(Y_t = y_t \mid X_t = x_t) = \mathtt{Uniform}(x_{t}-1, x_{t}+1)(y_t)

  1. What is E[X_t \mid X_{t-1} = x]?

    x-2
    x + 2p - 1
    p^2 * x
    p * x

  2. Assuming p=0.5, what is Var[X_t \mid X_{t-1} = x]?

    0.1
    0.2
    1.1
    0.05

  3. You know the initial state of the system X_0 = 0. Your friend Norm thinks it’s fine to initialize a particle filter with a single particle at 0. What do you think?

    This is fine and we can continue with a single particle.
    We should initialize our pf with N copies of this particle.

  4. Norm runs the filter for two steps with no observations several times and is trying to decide whether there could be bugs in the code. Assuming p=0.5, for each of the following sets of particles, indicate whether it is (a) fairly likely (b) quite unlikely (c) completely impossible:

    1. {-2.05, -1.95, -0.1, 0.1, 1.9, 2.1}

      a
      b
      c

    2. {-2.01, -1.9, -1.0, 0.1, 0, 2.1}

      a
      b
      c

    3. {-20, -2.01, -2.001, .01, .001, 1.99, 1.999}

      a
      b
      c

  5. Norm initializes the filter with particles {-2.05, -1.95, -0.1, 0.1, 1.9, 2.1} and then gets an observation of -1.0. Which of the following is a plausible posterior, assuming resampling?

    {-1.95, -0.1, -1.95, -1.95, -0.1, -0.1, -0.1}
    {-1.95, -.01}
    {0.1, 0.1, 0.1, -0.1, -0.1, -0.1}
    {-1.96, -0.11, -1.94, -1.97, -0.11, -0.09, -0.12}

5.1) Particle Filter Code and Visualization§

We've created a visualization and code simulation for this problem. You can run the code block to run the particle filter on the problem, experiment with different parameters, and plot the results.

6) Feedback§

  1. How many hours did you spend on this homework?

  2. Do you have any comments or suggestions (about the problems or the class)?