HW 10 -- Bandits and FOL
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
This week's Colab notebook can be found here: Colab Notebook (Click Me!)
1) Sparse Sampling Theory§
Proving an error bound on the sparse sampling algorithm is beyond our scope, but let’s consider a simpler case, with horizon 1. So, really, we are interested in a contextual bandit problem.
- We find ourselves in a state s.
- There are only two actions.
Normally, we would compute Q(s, a) = \sum_{s'} R(s, a, s') P(s' | s, a) for each action and select the action with the largest expected reward. But what if the number of possible next states is so large that we can’t afford to compute Q? We can draw n samples s'_i \sim P( \cdot | s, a), where i=1,2,\cdots,n and estimate
\hat{Q}(s, a) = \frac{1}{n} R(s, a, s'_i).
Assume that 0 < R(s, a, s’) < \mathrm{Rmax} for all s, a, s’. Let's think about how far wrong our estimate \hat{Q} can be from the true Q value.
Note that we're developing the general case, but for your answers in this question, you can assume that \mathrm{Rmax} = 1.
Chernoff and Hoeffding to the rescue! The idea of tail bounds from probability, which tell us (assuming independent samples) how the quality of our estimates depends on the size of the sample set. One version of their bound, applied to our problem, says that if we estimate \hat{Q} as above, then
P(|\hat{Q}(s, a) - Q(s, a)| \ge \epsilon) \le 2 \exp(-2n\epsilon^2 / \mathrm{Rmax}^2).
-
What's the upper bound for Q(s, a) for all s and a?
Please input your answer as a number.
-
You want to be at least 95% sure that your \hat{Q} estimates differ from the correct value by at most 0.1, how many samples would you need to take (at least)?
Please input your answer as a single integer.
-
How does the required number of samples, n, change if the size of state space S increases?
Increase Decrease Not Change -
What will happen to n if you want your estimates to differ from the correct value by at most 0.01 with probability 95%?
Increase by a factor of 10 Increase by a factor of 100 Increase by a factor of 1000 Decrease by a factor of 10 Decrease by a factor of 100 Decrease by a factor of 1000 Not Change -
How does n change (relative to your answer above) if you want to be 99% sure your estimates differ from the correct value by at most 0.01?
Increase by a factor of 2 Increase by a factor of 10 Decrease by a factor of 2 Decrease by a factor of 10 Not Change
2) Bandit§
-
Recall the
super-simple
bandit problem in lecture (see L.20 lecture note for details). In the super-simple bandit example, which of the following priors would mean that we couldn't maintain a factored belief representation?For each arm, P(p_i = .1) = .8, P(p_i = .5) = .1, P(p_i = .9) = .1 We know that all the arms are identical to each other, so with probability 1/3 all p_i=.1, with probability 1/3 all p_i=.5, and with probability 1/3 all p_i=.9 One arm has the prior P(p_i = .1) = .5, P(p_i = .5) = .5. For other arms, P(p_i = .1) = .8, P(p_i = .5) = .1, P(p_i = .9) = .1 -
What would happen if we tried using the most-likely observation determinization on a Bernoulli bandit problem?
Mark all that are true.It actually wouldn't apply because the underlying POMDP state space is infinite. Assuming there are only two arms, it would always make one of two plans: always pick right arm or always pick left arm. It would behave the same way as always picking the arm with the most total successes. It would behave the same way as always picking the arm with the highest ratio of \alpha / (\alpha + \beta) where \alpha, \beta are the parameters of the posterior beta distribution.
3) Semantics of FOL§
3.1) This question entails...§
\forall x . \;P(x) \models \forall y . \;P(y) | |
\forall x . \;P(x) \models \neg \exists x . \;\neg P(x) | |
\forall x . \;P(x) \models \exists x . \;P(x) | |
\exists x . \;P(x) \models \forall x . \;P(x) | |
\exists x . \;P(x) \models \exists y . \;P(y) | |
\forall x . \;P(x) \Rightarrow Q(x) \models \forall x . \;P(x) \land Q(x) | |
\exists x . \;P(x) \Rightarrow Q(x) \models \exists x . \;P(x) \land Q(x) | |
\forall x . \;P(x) \Rightarrow \exists y . \;Q(y) \models \exists z . \;Q(z) | |
\forall x . \;\exists y . \;R(x, y) \models \exists y . \;\forall x . \;R(x, y) | |
\forall x . \;\forall y . \;R(x, y) \Leftrightarrow R(y, x) \models \forall x . \;R(x, x) |
3.2) Modeling clay§
Consider the following set of sentences:
- \text{isNumber}(0)
- \forall x . \; \text{isNumber}(x) \Rightarrow \text{isNumber}(\text{successor}(x))
- \forall x, y . \; \text{greater}(x, y) \Rightarrow (x = \text{successor}(y) \lor \exists z . \; x = \text{successor}(z) \land \text{greater}(z, y))
- \forall x, y . \; \text{greater}(x, y) \Leftarrow (x = \text{successor}(y) \lor \exists z . \; x = \text{successor}(z) \land \text{greater}(z, y))
-
Consider the following model:
Universe = {happy, grumpy, sleepy, doc, sneezy} # these are actual dwarves Map from constant symbols to objects: {‘0’ -> doc} Predicate: isNumber = {doc} Function: successor = {{happy, happy}, {grumpy, grumpy}, {sleepy, sleepy}, {sneezy, sneezy}, {doc, doc}} Relation: greater = {{happy, happy}}
Which sentences are true in this model?Enter a Python list whose elements are in {1, 2, 3, 4}. -
Now consider the model
Universe = Natural numbers Map from constant symbols to objects: {‘0’ -> 0} Function: successor = lambda x: x + 1 Relation: greater = > relation on integers Predicate: isNumber = Universe
Which sentences are true in this model?Enter a Python list whose elements are in {1, 2, 3, 4}. -
Finally, consider the model where
Universe = Natural numbers Map from constant symbols to objects: {‘0’ -> 1000} Function: successor = lambda x: x Relation: greater = all pairs of integers Predicate: isNumber = Universe
Which sentences are true in this model?Enter a Python list whose elements are in {1, 2, 3, 4}.
4) First-order Logic Syntax and Semantics§
4.1) Atom Evaluation§
Note: for these questions, refer to the top of the Colab notebook. Write a function that takes a FOL atom and evaluates it against a single model.
For reference, our solution is 10 line(s) of code.
4.2) First-order Logic Sentence Evaluation§
Use your implementation of evaluate_atom to complete the following implementation of FOL sentence evaluation.
For reference, our solution is 37 line(s) of code.
In addition to all the utilities defined at the top of the Colab notebook, the following functions are available in this question environment: evaluate_atom
. You may not need to use all of them.
5) Formalization§
5.1) First words§
For each of the following, write down a first-order logical sentence that captures the meaning of the English remark, using the allowed predicates. Your formulation may be different from ours; if you can't get what we're looking for, write a private question on Piazza.
- Every student who takes AI has fun.
Mark all that have the intended meaning\forall x . \;\text{IsStudent}(x) \land \text{TakesAI}(x) \land \text{HasFun}(x) \forall x . \;\neg\text{IsStudent}(x) \lor \neg \text{TakesAI}(x) \land \text{HasFun}(x) \forall x . \;\neg\text{IsStudent}(x) \lor \neg \text{TakesAI}(x) \lor \text{HasFun}(x) \forall x . \;\text{IsStudent}(x) \land \text{TakesAI}(x) \Rightarrow \text{HasFun}(x) - All problem sets from all students have been graded at some time point before the end the semester.
We use the following abbreviations: \text{Pset}(p,s) means \text{ProblemSetByStudent}(p, s), \text{Graded}(p, t) means \text{GradedAtTime}(p, t) and \text{End}(t) means \text{EndOfSemester}(t). \text{Graded}(p, t) is true if and only if p is graded exactly at t, and \text{End}(t) is true if and only if t is exactly the end of semester. We also use \text{Before}(t, t'), which means time point t is strictly before t'.
In the following logic formulas, we are going to use t, t' for time, s for students, and p for problemsets.Mark all that have the intended meaning\forall t'. \lnot \text{End}(t') \lor (\forall s,p. \text{Pset}(p,s) \Rightarrow (\exists t. \text{Before}(t,t')\land \text{Graded}(p,t))) \forall s,p,t.\exists t'. \left(\text{Pset}(p,s) \land \text{Graded}(p,t)\lor \lnot \text{Before}(t,t')\Rightarrow \lnot \text{End}(t')\right) \forall s,p,t,t'. \left(\text{Pset}(p,s)\land \text{End}(t')\land \lnot \text{Before}(t,t') \Rightarrow \lnot \text{Graded}(p,t)\right) \exists t'. \forall s,p,t. \left(\text{Pset}(p,s)\land \text{Graded}(p,t)\land \text{End}(t')\Rightarrow \text{Before}(t,t')\right) - This one is particularly tricky. Each breakout room has at least three unique students.
We use the following abbreviations: \text{In}(s,r) means \text{InBreakoutRoom}(s,r). \textit{Eq}(s_1, s_2) is a predicate that is true when s_1 and s_2 are the same student.
In the following logic formulas, we are going to use r for rooms, and s_1, s_2, s_3, s_4 for students.Mark all that have the intended meaning\forall r,s_1,s_2,s_3. \left(\text{In}(s_1,r) \land \text{In}(s_2,r)\land \text{In}(s_3,r)\Rightarrow \lnot \text{Eq}(s_1, s_2) \land \lnot \text{Eq}(s_1, s_3) \land \lnot \text{Eq}(s_2, s_3)\right) \forall r. \exists s_1,s_2,s_3. \text{In}(s_1, r) \land \text{In}(s_2, r) \land \text{In}(s_3, r) \land \lnot \text{Eq}(s_1, s_2) \land \lnot \text{Eq}(s_1, s_3) \land \lnot \text{Eq}(s_2, s_3) \forall r. \exists s_1. \forall s_2,s_3. \exists s_4. \lnot \text{Eq}(s_2, s_4) \land \lnot \text{Eq}(s_3, s_4) \land \text{In}(s_1, r) \land \text{In}(s_4, r) \forall r. \exists s_1,s_2,s_3. \left(\text{In}(s_1, r) \land \text{In}(s_2, r) \land \text{In}(s_3, r) \Rightarrow \lnot \text{Eq}(s_1, s_2) \land \lnot \text{Eq}(s_1, s_3) \land \lnot \text{Eq}(s_2, s_3)\right) - People who work from home can never work and relax in the same room.
In the following logic formulas, we are going to use p for people, and r for rooms.Mark all that have the intended meaning\forall p .\; \forall r . \;\text{WorksFromHome}(p) \land \text{WorksInRoom}(p, r) \Rightarrow \neg \text{RelaxesInRoom}(p, r) \neg \exists p, r . \;\text{WorksFromHome}(p) \land \text{WorksInRoom}(p, r) \land \text{RelaxesInRoom}(p, r) \neg \exists p, r . \;\text{WorksFromHome}(p) \Rightarrow \text{WorksInRoom}(p, r) \land \text{RelaxesInRoom}(p, r) \forall p .\;\neg \exists r . \;\text{WorksFromHome}(p) \land \text{WorksInRoom}(p, r) \land \text{RelaxesInRoom}(p, r) \forall p .\;\neg \exists r . \;\text{WorksFromHome}(p) \land \text{WorksInRoom}(p, r) \Rightarrow \text{RelaxesInRoom}(p, r)
6) FOL Proof§
6.1) FOL clausal form§
Convert each of these sentences into clausal form.
Enter each formula as a list of lists of literal strings. A literal
string is either a term, e.g. 'O(x,y)'
or the negation of a term,
e.g. '~O(x,y)'
. A typical clause will look like: ['D(y)',
'~O(x,y)', '~L(x)']
. And a CNF formula is a list of clauses. If you
need a Skolem function, call it sk
, e.g. sk(x)
. In these
problems, you will not need more than one Skolem function. Do not
include any spaces in the strings.
-
Anybody who owns a dog is an animal lover!
\forall x . \; (\exists y . \; D(y) \land O(x,y)) \Rightarrow L(x)
-
Every animal lover owns a dog!
\forall x . \; L(x) \Rightarrow (\exists y . \; D(y) \land O(x,y))
-
Dr. Evil doesn’t own a dog. From this, we would like to conclude that he is not an animal lover. We can draw this conclusion based on one of the sentences above. Which one?
Enter a number (1 or 2)
6.2) FOL resolution§
Now, we will use the sentence you chose in part 5.1.3 as an assumption, also assume that Dr. Evil does not own a dog, and then prove from these assumptions that Dr. Evil is not an animal lover. The first step in doing a resolution proof is to convert the premises to clausal form. You already did part of that! Below, you will select each of the premises that we will need for a proof.
-
Clause 1 One clause from the previous problem that involves a dog, that is, predicate D. Recall the process of standardizing variables apart for clauses that will be used in a resolution proof.
Choose one:~D(x1) D(sk(x1)) ~L(x1) v D(x2) ~L(x1) v D(sk(x1)) ~D(x2) v ~O(x1, x2) v L(x1) -
Clause 2 One clause from the previous problem that involves an owner, that is, predicate O.
Choose one:~O(x2,x3) O(x2,sk(x2)) ~L(x2) v O(x2,x3) ~L(x2) v O(x2,sk(x2)) ~D(x3) v ~O(x2, x3) v L(x3) -
Clause 3 We need to add to our premises that Dr. Evil doesn’t own a dog. Choose the clausal form; use constant symbol ‘DrE’ for Dr. Evil.
Choose one:O(DrE, x3) ~O(DrE, x3) O(DrE, x3) v D(x3) ~O(DrE, x3) v ~D(x3) ~O(DrE, x3) v D(x3) -
Clause 4 We also need to negate the desired conclusion and convert it to clausal form.
Choose one:L(x4) ~L(x4) L(sk()) ~L(sk()) L(DrE) ~L(DrE) -
Clause 5 Now, we can actually do the proof, using FOL resoluton, starting with the four clauses above. We get another clause by resolving clauses 1 and 4.
Enter a clause as a list of lists of literal strings. -
Clause 6 We get the next clause by resolving clauses 2 and 4.
Enter a clause as a list of lists of literal strings. -
Clause 7 We then resolve clauses 3 and 5.
Enter a clause as a list of lists of literal strings. -
Now we get a contradiction between clauses 6 and 7 and we're done!
7) First-order Logic Inference§
7.1) FOL Binary Resolution§
Complete the following implementation of first-order binary resolution. Given two clauses, return all possible new clauses that result from applying the binary resolution rule.
Hint 1: The helper function unify
may return an empty dict for certain inputs. Note that this means the inputs
can be successfully unified without any variable substitution.
Hint 2: In your code, you can use if unify(a, b, {}) is not None
to test if the unification is successful.
For reference, our solution is 8 line(s) of code.
7.2) FOL Resolution Prover§
Complete the following implementation of a first-order resolution prover. Given a sentence in CNF form, and a single query clause, check if the sentence entails the query. See unit tests for examples.
For reference, our solution is 17 line(s) of code.
In addition to all the utilities defined at the top of the Colab notebook, the following functions are available in this question environment: binary_resolution
. You may not need to use all of them.
8) Feedback§
-
How many hours did you spend on this homework?
-
Do you have any comments or suggestions (about the problems or the class)?