HW 3 -- Temporal models
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
This week's Colab notebook can be found here: Colab Notebook (Click Me!)
1) Variable Elimination§
Implement Variable Elimination given a list of Potentials and a Variable order.
You may use the functions multiply_potentials
and marginalize
from last week's homework.
For reference, our solution is 17 line(s) of code.
2) Gaussians§
2.1) §
A + B | |
10A - 12B | |
A \times B | |
The joint distribution of A and B |
Let A \sim \mathcal{N}(\mu_A, \sigma_{A}^{2}) and let B | A \sim \mathcal{N} (A, \sigma^{2}_{B|A}). We can show the distribution A | B = b is a univariate Gaussian.
For parsing purposes, enter \mu_A as mu_A
, \sigma_{A} as sigma_A
, \sigma_{B \mid A} as sigma_BA
, b as b
, and the ^
character for exponentation.
2.2) §
2.3) §
2.4) §
True | |
False |
2.5) §
True | |
False |
3) Continuous PGMs §
Consider a Bayesian network with three one-dimensional continuous variables: A, B, and C. Let A \sim \mathcal{N}(1, 1), B \sim \mathcal{N}(2, 4), and C \sim \mathcal{N}(A + B, 1).

3.1) §
3.2) §
3.3) §
3.4) §
Hint: We can apply linearity of covariance to compute Cov(A,C) and Cov(B, C).
3.5) §
3.6) §
3.7) §
Hint: See Lec05 slide 22 for the general formula of the covariance matrix.
3.8) §
3.9) §
Independent | |
Not independent |
3.10) §
Conditionally independent | |
Not conditionally independent |
4) Sampling§
4.1) Sampling types§
In the limit of infinite samples, the statistics we compute from a set of samples generated by Gibbs sampling, such as the mean of a variable, match the statistics of the underlying distribution. | |
It is possible to know ahead of time how many samples will be needed for importance sampling. | |
For some samplers, it is possible to know at execution time, by measuring the total weight of the samples collected so far, whether more samples are needed | |
When computing importance weights w(X) = P(X)/Q(X) for a sample X, it is important that P(X) be correctly normalized. | |
It is important that the potentials in the graphical model are correctly normalized for sampling to work. | |
We need to know ahead of time the statistics of the target distribution when designing the proposal distribution for importance sampling. | |
A bad proposal distribution means more samples from the proposal distribution will be needed to get a good estimate of the target distribution. | |
We don't need to worry about the direction of arrows when Gibbs sampling in a Bayes net. | |
The samples generated by Gibbs sampling are independent and identically distributed. |
4.2) Sampling efficiency§
Kullback-Leibler divergence is an estimate of how similar two distributions are, and for discrete distributions it is given by
Let us assume that P is the target distribution, and Q is the proposal distribution, and we are performing rejection sampling.
What does this mean for our sampling strategy?
When Q(x) is small for x\in {\mathcal {X}} where P(x) is large, D_{KL}(P||Q) will always be large. | |
When Q(x) is large for x\in {\mathcal {X}} where P(x) is small, D_{KL}(P||Q) will always be large. | |
When D_{KL}(P||Q) is large, the sampling strategy will be very good. P and Q are closely aligned. | |
When D_{KL}(P||Q) is small, the sampling strategy will be very good no matter what. P and Q are closely aligned. | |
Our sampling strategy will be efficient when the KL divergence is small and P(x) isn't small or zero for x where Q(x) is large. | |
If Q has zero probability in places where P is non-zero, we will not get a representative set of samples. |
5) Sampling§
5.1) Gibbs Sampling§
Consider the following 3 node factor graph:
The table on the right gives three different potential tables for the factor. Gibbs sampling did a poor job on generating samples that reflected the joint of these three variables for one of these potential tables. Which was it?
a | |
b | |
c |
5.2) Sampling in Bayes net§
Consider the following Bayes net.
We are interested in drawing samples from the distribution described by this network, conditioned on evidence, using ancestral sampling with rejection.
-
Is this a good way to sample from P(K \mid B = b)?
True False -
Is this a good way to sample from P(B \mid K = k)?
True False
5.3) Markov Blanket§
Consider the Bayes net from the above question.

-
Enter the set of nodes in the Markov Blanket of node G, e.g.
['B','D']
. -
Enter the set of nodes in the Markov Blanket of node D, e.g.
['B','D']
.
6) HMMs: Are my leftovers edible? §
I use an HMM to model the state of my leftover food over time. The state space is \{\textit{tasty}, \textit{smelly}, \textit{furry} \}. Initially, I'm sure they're tasty: P(S_0 = \textit{tasty}) = 1. The state transition model is:

6.1) State Distributions §
6.2) Message Passing§
Let \phi_{ij} be the factor between nodes S_i and S_j. Provide the values of the following messages (we calculated a couple of them for you). Give your answer a Python list of 3 numbers, with the assumption that the possible values are, in order, (tasty, smelly, furry). We're asking for the messages in the order we would want them to compute P(S_2).i. S_0 \rightarrow \phi_{01} = \left(1, 0, 0\right)
ii.
iii.
iv.
v. S_4 \rightarrow \phi_{34} = (0, 0, 1)
vi.
viii.
ix. \phi_{23} \rightarrow S_2 = \left(\frac{1}{9}, \frac{1}{2}, 1 \right)
6.3) Checking Message Passing §
To calculate P(S_2 \mid S_0 = \textit{tasty}, S_4 = \textit{furry}), we multiply the messages from (enter the answers in order of increasing day):
\phi_{01} | |
\phi_{12} | |
\phi_{23} | |
\phi_{34} |
and
\phi_{01} | |
\phi_{12} | |
\phi_{23} | |
\phi_{34} |
and then
normalize | |
don't normalize |
6.4) Sampling Formulas §
You decide to estimate P(S_2) by forward (prior) sampling. You generate 100 sample trajectories of length 3 for your leftovers. Given those trajectories what are the three terms to construct an estimate of P(S_2)?
(number of trajectories ending in tasty)/100 | |
(number of trajectories with tasty)/100 | |
(number of trajectories ending in smelly)/100 | |
(number of trajectories with smelly)/100 | |
(number of trajectories ending in furry)/100 | |
(number of trajectories with furry)/100 |
(number of trajectories ending in tasty)/100 | |
(number of trajectories with tasty)/100 | |
(number of trajectories ending in smelly)/100 | |
(number of trajectories with smelly)/100 | |
(number of trajectories ending in furry)/100 | |
(number of trajectories with furry)/100 |
(number of trajectories ending in tasty)/100 | |
(number of trajectories with tasty)/100 | |
(number of trajectories ending in smelly)/100 | |
(number of trajectories with smelly)/100 | |
(number of trajectories ending in furry)/100 | |
(number of trajectories with furry)/100 |
6.5) Kalman Filters Check §
You decide that three discrete values is not sufficient precision for describing the state of your leftovers! You would like to score them on a continuum from 0.0 (completely fresh and tasty) to 1.0 (totally decomposed). Would a Kalman filter be a good choice of solution for predicting the current state of your leftovers given some previous observations?
Yes | |
No |
7) Properties of Viterbi§
Maximillian thinks the Viterbi algorithm is too complicated and we could get away without the backward “decoding” pass, and instead just return the maximum of the marginal distribution at each time step.
You know better than Max! In the HMM below, provide a sequence of three observations for which Max will get the wrong answer, but Viterbi decoding will be correct Your answer should be a Python list of three True/False values. Assume that the initial distribution is uniform.

[True, True, True]
.
8) Coding: Localization with Viterbi§
In this section, we will implement an HMM for a robot that is moving around randomly in a 2D grid with obstacles. The robot has sensors that allow it to detect obstacles in its immediate vicinity. It knows the grid map, with the locations of all obstacles, but it is uncertain about its own location in the grid. We will use Viterbi to determine the most likely locations for the robot given a sequence of local and potentially noisy observations.
Concretely, we will represent the 2D grid with obstacles as a list of lists, where 1s represent obstacles and 0s represent free space. Example:
obstacle_map = [
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1],
[1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0]
]
The state of the robot is its location in the grid, represented as a tuple of ints, (row, col). Transitions are uniformly distributed amongst the robot's current location and the neighboring free (not obstacle) locations, where neighboring = 4 cardinal directions (up, down, left, right).
Observations are a 4-tuple that list which directions have obstacles, in order [N E S W], with a 1 for an obstacle and 0 for no obstacle. Observations that are "off the map" are 1, as though they are obstacles. For instance, in the map above, if there were no observation noise, then the observation for the top left corner (state=(0, 0)) would be (1, 0, 1, 1). Observations can also be corrupted with noise; see the create_observation_potential
docstring for more details.
Our ultimate task will be to take in a sequence of observations and return the corresponding sequence of most likely states.
8.1) Conditioning Warmup§
Given a potential and a variable that occurs in it, return a new potential conditioned on the specified variable having value 0.
For reference, our solution is 2 line(s) of code.
8.2) HMM§
Write a function that creates a random variable for a state at a given time step in an obstacle HMM. The domain of the state variable should be a list of (row, col) indices into the map. Only free positions (not obstacles) should be included in the domain of the state variable. See docstring for more description.
For reference, our solution is 4 line(s) of code.
8.3) Transition potential creation§
Write a function that creates a potential for the transition distribution between states s_{t} and s_{t+1}. Refer to the previous question for more information about the state variables and their domains.
For reference, our solution is 13 line(s) of code.
8.4) Observation variable creation§
Write a function that creates a random variable for an observation at a given time step in an obstacle HMM. See docstring for description.
For reference, our solution is 3 line(s) of code.
8.5) Observation potential creation§
Write a function that creates a potential for the observation distribution between s_{t} and z_{t}.
For reference, our solution is 29 line(s) of code.
8.6) Viterbi Decoding§
Complete the implementation of Viterbi decoding. Note: in general, it is a good idea for numerical stability to implement Viterbi in log space. However, for the purpose of this homework, we are not expecting your implementation to be in log space. Moreover, the provided helper functions (e.g., multiply_potentials
) assume that you are not working in log space.
For convenience, we have the following utilities:
- We can access the initial distribution through
hmm.initial_distribution
- To initialize the forward message, we have
condition_potential
andmultiply_potentials
which can be used to compute the observation message and then be multiplied to the initial distribution. - We also have access to
argmax_potential
,max_potential
,normalize_potential
. See the Colab file for implementation details.
For reference, our solution is 38 line(s) of code.
8.7) Viterbi Visualization§
We have created a visualization tool for you in colab to visualize the inference result. You can run the following code blocks and they will generate the visualized trajectory step by step, and show the comparison between the observed trajectory as well as the "noiseless" observation.
If you are not passing the test, you can use the simple map visualize_viterbi_test1
for debugging.
Hint: one debugging strategy is to modify your viterbi code and print out the likelihood you get at each step.
9) Feedback§
-
How many hours did you spend on this homework?
-
Do you have any comments or suggestions (about the problems or the class)?