CS 463, Fall 2012, last assignment
Due Friday, Oct. 30th, at the beginning of class.
Given the description of the MDP M,
- Use value iteration to find the optimal policy with no discount and horizon 7 (in other words, find V^7 for all states).
- Use value iteration to find an optimal infinite-horizon, discounted
policy with discount (gamma) = 0.95.
To hand in on paper:
-
For each instance of policy iteration hand in the value function and the policy.
For uniformity, please give the value function and the policy in the grid, and
if possible, use arrows to indicate the action for each state.
In other words, you should draw the grid and draw arrows indicating,
for each square, the direction of the suggested action. Use a separate grid
for the value function. (You should have 4 grids.)
-
What you learned from this assignment.
-
Your favorite thing about CS 463 this semester.
To hand in via email:
Your VI code.