1. Markov Reward Process
1) Closed-Form
2) DP
2. Markov Decision Process
1) Policy Evaluation
2) Policy Iteration
- CS234 (Compute infinite horizon value of a policy!)
- MIT 16.410
3) State-Action Value Q
Q function basically tells you how good it is to be in state S and perform action A, and follow policy pi from the next state onwards.
4) Policy Improvement
5) Value Iteration
Compute optimal value for horizon = k!
6) Bellman backup operator