CS 463, Fall 2012, last assignment

Due Friday, Oct. 30th, at the beginning of class.

Given the description of the MDP M,

Use value iteration to find the optimal policy with no discount and horizon 7 (in other words, find V^7 for all states).
Use value iteration to find an optimal infinite-horizon, discounted policy with discount (gamma) = 0.95.

To hand in on paper:

For each instance of policy iteration hand in the value function and the policy. For uniformity, please give the value function and the policy in the grid, and if possible, use arrows to indicate the action for each state. In other words, you should draw the grid and draw arrows indicating, for each square, the direction of the suggested action. Use a separate grid for the value function. (You should have 4 grids.)
What you learned from this assignment.
Your favorite thing about CS 463 this semester.

Your VI code.