HW 4 -- Kalman Filters and Propositional Logic

The questions below are due on Monday March 10, 2025; 11:59:00 PM.

You are not logged in.
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

This week's Colab notebook has two parts, one for the coding questions about Kalman filters, Kalman Filters Colab Notebook (Click Me!), and one for Propositional Logic coding questions Propositional Logic Colab Notebook (Click Me!).

1) Gibbs Sampling§

We are going to study Gibbs sampling in a concrete case, to illustrate some of the principles. Assume that we choose variables to change uniformly in the Gibbs procedure. Consider the following Bayes net with three binary variables, A, B, and C.

What's the joint distribution? We've filled in most of the values for you.

What is P(A=0, B=1, C=0)?

What is P(A=1, B=1, C=1)?

In Gibbs sampling, say we start with assignment (0, 0, 0). Let's calculate some probabilities of a single step.

What is the probability of a move to (1,1,0)?

What is the probability of a move to (1,0,0)?

What is the probability of a move to (0,0,0)?

Now say we start with assignment (1, 0, 0).

What is the probability of a move to (0,0,0)?

Recall that Gibbs sampling implicitly constructs a Markov chain where the states are assignments of values to variables, and T(x, x') is the probability of moving from x to x'. One step in showing that the true joint distribution of this model (\pi) is the stationary distribution of this Markov chain, is to show that, for all x and x',
\pi(x) \cdot T(x, x') = \pi(x') \cdot T(x', x).
This is called detailed balance, which implies stationarity because it says the probability leaving is the same as the probability entering on every iteration. Let's verify the equation holds for a pair of states.

What is \pi((0,0,0)) \cdot T((0,0,0), (1,0,0))?

What is \pi((1,0,0)) \cdot T((1,0,0), (0,0,0))?

Is our condition satisfied for this case?

Another requirement on the correctness of Gibbs sampling is that the underlying Markov Chain is ergodic. A sufficient condition for this is that there be no 0's in the factors. What would happen if we changed our CPTs to:

Could we reach (0,1,0) from (0,0,0) (in possibly multiple steps)?

What is the probability of (0,1,0) in the joint distribution of the Bayes net?

Could we reach (1,1,1) from (0,0,0) (in possibly multiple steps)?

What is the probability of (1,1,1) in the joint distribution of the Bayes net?

Will Gibbs sampling give us samples from the true joint distribution in this case?

2) HMMs and Kalman Filters§

Indicate which of the following statements are true:

	The forward-backward algorithm for HMMs is exactly the same algorithm as the Viterbi algorithm.
	The forward-backward algorithm for HMMs is a special case of the sum-product algorithm.
	The Viterbi algorithm for HMMs is a special case of the max-product algorithm.
	Forward-backward is linear in the number of time steps.
	Filtering to maintain a state estimate after each new observation requires us to store the history of all forward messages.
	Running smoothing in an HMM efficiently to update estimates of past states after each new observation requires us to store the history of all forward messages.
	Kalman filtering requires, for each time step, a number of multiplication operations that is roughly cubic in the number of state variables.
	HMM filtering requires, for each time step, a number of multiplication operations that is roughly cubic in the dimension of the observation vector.
	Kalman filtering can be used to determine the true most likely state estimate at time t when the state is continuous and the transition and observation dynamics are linear and Gaussian.
	In Kalman filtering, if we only care about the maximum likelihood state estimate at time t, and not the covariance of our estimate, we don't need to compute the covariances at previous time steps.
	We can use an HMM model and algorithm to find an exact solution for the most likely state estimate at time t when the state is continuous and the transition and observation dynamics are linear and Gaussian.
	If we are using a HMM to make decisions at each timestep t, we only need to store the marginal over the state s_{t-1}, the transition potential from s_{t-1} to s_t and the measurement potential from o_t to s_t.

3) Kalman Filtering§

Consider the following one-dimensional continuous-state system:

x_t = a x_{t-1} + u_{t-1} + w_t

y_t = h x_t + v_t

where w_t \sim N(0, \sigma_w^2) and v_t \sim N(0, \sigma^2_v). This is the same linear-Gaussian model that we discussed in lecture, but we now have a control input u_t.

Let us say that a = 2, h = 10, \sigma^2_w = 1 and \sigma^2_v = .25. Let us also say that the initial estimate of the state is

p(x_0) = N \left ( \begin{bmatrix} \mu_0 \\ \sigma^2 \end{bmatrix} \right ) = N(0, 1)

Given a single action u_0 = 1, what is the mean of x_1?

Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).
Given a single action u_0 = 1, what is the variance of x_1?

Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).
After the action u_0 = 1, what is the expected measurement y_1?

Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).
Given a subsequent measurement y_1 = 0, what is the variance of the posterior on x_1?

Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).
Given a subsequent measurement y_1 = 0, what is the mean of the posterior on x_1?

Enter a number that is accurate to within 1.0e-5. You can also enter a python expression that will evaluate to a number (e.g., 3*2 + 4 - 7/11.0).

The covariance of the posterior will ...

	always get smaller over time
	always get bigger over time
	get bigger or smaller over time depending on the relative magnitudes of a to h
	get bigger or smaller over time depending on the relative magnitudes of \sigma_w to \sigma_v

4) Coding: Kalman Filter§

Let's implement the Kalman Filter for the one-dimensional continuous-state system from the previous section.

4.1) Kalman Filter Warmup§

Return an Gaussian object with mean=0, std=2. Please see the provided code for the definition of Gaussian.

For reference, our solution is 1 line(s) of code.

4.2) Kalman filter process model§

Implement the process step of a 1D Kalman filter, given a Gaussian prior distribution, a process model a, \sigma^2_w, and a control u.

For reference, our solution is 4 line(s) of code.

4.3) Kalman filter measurement model§

Implement the measurement step of a 1D Kalman filter, given a prior mean and variance, a measurement model h, \sigma^2_v, and a received measurement y.

For reference, our solution is 18 line(s) of code.

4.4) Kalman filter§

Combine the process and measurement steps into a single estimator that takes a prior, a process model, a measurement model, a list of controls, measurements, and returns a posterior.

For reference, our solution is 9 line(s) of code.

In addition to all the utilities defined at the top of the Colab notebook, the following functions are available in this question environment: kf_measurement, kf_process. You may not need to use all of them.

4.5) Kalman Filter Visualization§

We have created a visualization tool in Colab for the filtering process. You can run the code blocks and an animation of the filtered Gaussian in each step will be shown. There is also a plot showing the line chart of ground-truth locations, measurements, and posteriors.

Run the animation that visualizes the distribution of x_t after each process update and observation update. Use the "next frame" button (to the right of "play") to see what happens one step at a time.

After a process step, does the variance of x_t increase or decrease?

After we get an observation and compute the posterior, does the variance of x_t increase or decrease?

Now look at the plot below the animation. The plot shows a green region that is the Kalman filter's mean +/- one stdev for each timestep after the observation update.

If the model being used by the Kalman filter is correct, roughly what percentage of time would you expect the ground truth location to be outside the green interval?

If there were no process noise, which of the following would happen? (Mark all that apply)

	As time passes, the green area will shrink.
	As time passes, the measurements (in orange) will stay inside the green region.
	As time passes, the Kalman filter mean will get closer to the ground truth.

5) Particle Filter§

Consider a domain in which the forward transition dynamics are “hybrid” in the sense that

P(X_t = x_t \mid X_{t-1} = x_{t-1}) = p * N(x_{t-1} + 1, 0.1)(x_t) + (1-p) * N(x_{t-1}-1, 0.1)(x_t)

that is, with probability p, the state will hop forward one unit in expectation, and with probability 1-p, it will hop backward one unit in expectation, with variance 0.1 in each case.

Assume additionally that the observation model P(Y_t = y_t \mid X_t = x_t) = \mathtt{Uniform}(x_{t}-1, x_{t}+1)(y_t)

What is E[X_t \mid X_{t-1} = x]?

x-2

x + 2p - 1

p^2 * x

p * x
Assuming p=0.5, what is Var[X_t \mid X_{t-1} = x]?

0.1

0.2

1.1

0.05
You know the initial state of the system X_0 = 0. Your friend Norm thinks it’s fine to initialize a particle filter with a single particle at 0. What do you think?

This is fine and we can continue with a single particle.

We should initialize our pf with N copies of this particle.
Norm runs the filter for two steps with no observations several times and is trying to decide whether there could be bugs in the code. Assuming p=0.5, for each of the following sets of particles, indicate whether it is (a) fairly likely (b) quite unlikely (c) completely impossible:
1. {-2.05, -1.95, -0.1, 0.1, 1.9, 2.1}
  
  a
  
  b
  
  c
2. {-2.01, -1.9, -1.0, 0.1, 0, 2.1}
  
  a
  
  b
  
  c
3. {-20, -2.01, -2.001, .01, .001, 1.99, 1.999}
  
  a
  
  b
  
  c
Norm initializes the filter with particles {-2.05, -1.95, -0.1, 0.1, 1.9, 2.1} and then gets an observation of -1.0. Which of the following is a plausible posterior, assuming resampling without noise?

{-1.95, -0.1, -1.95, -1.95, -0.1, -0.1, -0.1}

{-1.95, -.01}

{0.1, 0.1, 0.1, -0.1, -0.1, -0.1}

{-1.96, -0.11, -1.94, -1.97, -0.11, -0.09, -0.12}

5.1) Particle Filter Code and Visualization§

We've created a visualization and code simulation for this problem, using the same hybrid dynamics model at the start of this problem. You can run the code block to run the particle filter on the problem, experiment with different parameters, and plot the results.

Sometimes this particle filter experiences “particle exhaustion” when all the particle weights are 0. Mark all that are true:

	Particle exhaustion happens when the current observation is impossible given all the particles.
	Particle exhaustion happens when the current motion is impossible given all the particles.
	If we get data from the same actual system as described above, but changed the motion model in the particle filter to be a single Gaussian, then we would be less likely to experience particle exhaustion.
	If we get data from the same actual system as described above, but changed the observation model in the particle filter to be Gaussian (rather than uniform), we would be less likely to experience particle exhaustion.
	If we used fewer particles, we would be less likely to experience particle exhaustion.
	If we get data from the same actual system as described above, but changed the motion model in the particle filter to not have Gaussian noise (so moving +1 with probability p, and -1 with probability 1-p), then we would be more likely to experience particle exhaustion.

Experiment with the number of particles. The animation shows the location of the particles and ground truth after each timestep. The line plot shows the location over time, as well as a green shaded region indicating the particle filter's mean +/- one stdev of the particles.

Roughly how many particles do you need to avoid particle exhaustion for 20 steps?

As you increase the number of particles, is the ground truth within the green region more or less frequently?

What roughly is the maximum distance from a particle to the ground truth?

6) Propositional Logic§

6.1) Model checking§

Let’s explore proof by model checking. We are interested in finding out which other sentences are entailed by the sentence (A \Rightarrow B) \Rightarrow C. We can do this by comparing the sets of models (assignments of truth values to propositions) in which the sentences are true.

Consider a domain with three propositional variables, A, B, and C.

Recall that M(\gamma) is the set of models in which sentence \gamma is true. We can conclude that sentence \alpha entails sentence \beta via model-checking if

Choose the correct one:

M(\alpha) \subseteq M(\beta)

M(\beta) \subseteq M(\alpha)

M(\alpha) \subset M(\beta)

M(\beta) \subset M(\alpha)

M(\beta) \equiv M(\alpha)

|M(\beta)| > |M(\alpha)|
Check all models in which (A \Rightarrow B) \Rightarrow C is true.

A=t, B=t, C=t

A=t, B=t, C=f

A=t, B=f, C=t

A=t, B=f, C=f

A=f, B=t, C=t

A=f, B=t, C=f

A=f, B=f, C=t

A=f, B=f, C=f
Check all models in which A \Rightarrow (B \Rightarrow C) is true.

A=t, B=t, C=t

A=t, B=t, C=f

A=t, B=f, C=t

A=t, B=f, C=f

A=f, B=t, C=t

A=f, B=t, C=f

A=f, B=f, C=t

A=f, B=f, C=f

Check all that are true.

	(A \Rightarrow B) \Rightarrow C and A \Rightarrow (B \Rightarrow C) are equivalent
	(A \Rightarrow B) \Rightarrow C entails A \Rightarrow (B \Rightarrow C)
	A \Rightarrow (B \Rightarrow C) entails (A \Rightarrow B) \Rightarrow C

6.2) Concepts in logic§

Consider a domain with propositions A, B, C, and D, and the particular model m = \{A = t, B = f, C = t, D = f\}. For each of these sentences, indicate whether it is valid, unsatisifiable, not valid but true in m, or not unsatisifiable but false in m.

Recall that a sentence is valid if the sentence is true in all models, and recall that a sentence is unsatisfiable if the sentence is false in all models.

A \Rightarrow \neg A
\neg (A \wedge B) \Rightarrow (\neg A \vee \neg B)
B \Rightarrow C \wedge D
A \Rightarrow C \wedge D
(A \wedge C) \Leftrightarrow (B \wedge D)
A \vee B \vee C \vee D
D \Leftrightarrow \neg D

7) Propositional Logic Semantics Review§

7.1) Propositional Sentence Evaluation§

Note: for these questions, refer to the top of the Colab notebook. Write a function that takes a propositional sentence and evaluates it against a single model. You may find python's builtin isinstance useful. For example, isinstance(sentence, And) returns whether a sentence is an And.

For reference, our solution is 15 line(s) of code.

8) Propositional Logic Coding Problems§

8.1) Warmup§

In this problem, CNF formulas are represented as lists of lists of nonzero integers. The sign of the integer represents whether the corresponding proposition is negated or not. For example, the formula ((x1 or not x2) and (x3 or x2)) would be represented as [[1, -2], [3, 2]]. Complete the following function to confirm your understanding of this representation.

For reference, our solution is 2 line(s) of code.

8.2) DPLL (Optional!)§

This question is optional. If you are interested, you can read about DPLL and its pseudocode in AIMA section 7.6.

Use your helper functions to complete an implementation of DPLL.

For reference, our solution is 59 line(s) of code.

9) Logic for State Estimation§

Consider a "search and rescue" robot who will be charged with navigating a sometimes dangerous grid to find and help people in need. Assume a partially observable domain, consider the subproblem of inferring what locations in a grid contain smoke or fire based on a limited set of observations and our knowledge about the relationship between smoke and fire.

9.1) Warming up with sympy§

In this problem, we will be using the sympy library for propositional logic to do inference. This will redefine the logic data structures and operations we were using in the previous problems and give us much faster inference.

Please take a moment to review the documentation: https://docs.sympy.org/latest/modules/logic.html.

Then complete the following warm up questions.

Feel free to import any additional components from the library.

9.1.1) Sympy Warmup 1§

Use sympy to determine whether the following formula is satisfiable: (\neg x_1 \land x_2) \Rightarrow ((x_2 \lor x_3) \land (x_1 \lor \neg x_3)).

Note that the return type should be bool.

For reference, our solution is 4 line(s) of code.

9.1.2) Sympy Warmup 2§

Use sympy to determine whether the following formula is satisfiable: (x_1 \lor x_2) \land (\neg x_1 \lor \neg x_2) \land (x_1 \lor \neg x_2) \land (\neg x_1 \lor x_2).

For reference, our solution is 4 line(s) of code.

9.2) Search and Rescue§

Now, we will look at the actual inference problem that we are interested in. We will consider several problems with varying grid sizes and different sets of observations. For example, consider the grid below:

# Fire, Unknown, Clear, Smoke
GRID0 = np.array([
  ["F", "U", "C"],
  ["S", "C", "U"],
  ["U", "U", "C"]
], dtype=object)

This grid has 9 locations and 5 observations: there is fire in the top left, smoke below it, and the center, top right, and bottom right locations are all known to be clear of smoke or fire.

We will assume the following axioms:

Each location has exactly one of {smoke, fire, clear}.
There is smoke at a location if and only if there is a fire in at least one of the adjacent (above, below, left, right) locations. Diagonals are not adjacent!

Note that as a corollary, there cannot be two fires in adjacent locations.

Take a moment to run your human inference engine: which unknown values in the grid above can be determined?

9.2.1) Search and Rescue Inference§

Write a program that takes a grid as input and infers unknown values.

Your program should output a new grid with all determinable unknown values replaced with the inferred value. If an unknown value cannot be determined, it should be left unknown.

Your program should use sympy.

For reference, our solution is 57 line(s) of code.

9.2.2) Beware§

Consider the following grid:

GRIDX = np.array([
    ["U", "U", "S", "U"],
    ["C", "U", "U", "S"],
    ["S", "U", "U", "U"],
    ["C", "S", "S", "S"]])

Use the staff solution for infer_unknown_values(click View Answer) on this input. The result should violate your expectations.

Note the line in infer_unknown_values:

       for elm in ["C", "S", "F"]:

Change that line so the that it reads:

        for elm in ["F", "S", "C"]:

Now what do you get? What's going on?

Choose all that apply:

	The inference code is buggy.
	The smoke observations in this case are not consistent with any fire arrangements.
	The smoke observations in this case require having fires in adjacent grid cells.
	The first value in the `elm` list that is entailed by the axioms is placed in the map.
	Anything follows from a contradiction.

10) (Bonus) Real-World Applications§

The ideas we've covered in this homework (Kalman Filters and Propositional Logic) were extremely influential in AI and underpin several important modern applications.

Kalman filters were instrumental to getting the first Apollo mission to the moon and back. It is currently used in almost every scenario where the pose of a system must be estimated from noisy sensor data, such as estimating the positions of airplanes in the sky, or to tracking the motion of humans wearing VR headsets. Modern research is even trying to connect Kalman filtering to large-scale machine learning systems.

Inference based on propositional logic is used in a very wide range of applications related to math and software engineering. Modern SAT solvers are used to do things from checking orders for parts that make up car configurations to verifying the correctness of circuits. In fact, Conda --- the popular package management system for Python --- uses a SAT solver under the hood to check whether different python package versions clash. Additionally, a big trend in current research is to combine the 'common sense' within large scale machine-learning models --- such as Large Language Models (LLMs) --- with the verifiability and correctness of formal logic. Here are some relevant recent papers:

11) Feedback§

How many hours did you spend on this homework?
Do you have any comments or suggestions (about the problems or the class)?

	This is fine and we can continue with a single particle.
	We should initialize our pf with N copies of this particle.

	{-1.95, -0.1, -1.95, -1.95, -0.1, -0.1, -0.1}
	{-1.95, -.01}
	{0.1, 0.1, 0.1, -0.1, -0.1, -0.1}
	{-1.96, -0.11, -1.94, -1.97, -0.11, -0.09, -0.12}

	M(\alpha) \subseteq M(\beta)
	M(\beta) \subseteq M(\alpha)
	M(\alpha) \subset M(\beta)
	M(\beta) \subset M(\alpha)
	M(\beta) \equiv M(\alpha)
	\|M(\beta)\| > \|M(\alpha)\|

	A=t, B=t, C=t
	A=t, B=t, C=f
	A=t, B=f, C=t
	A=t, B=f, C=f
	A=f, B=t, C=t
	A=f, B=t, C=f
	A=f, B=f, C=t
	A=f, B=f, C=f

	x-2
	x + 2p - 1
	p^2 * x
	p * x

	0.1
	0.2
	1.1
	0.05