Basic model of Reinforcement Learning
Basic model for Reinforcement learning, about value function V(s), Q(s, a) function and continuation value function C(s, a)
- Bellman equation in 3 different representations
Value for state s
$$ V(s)=\max_{a} \left[ R(s,a)+r\sum _{s'}\Gamma(s,a,s')V(s') \right] $$
Value for state s with action a
$$ Q(s,a)=R(s,a)+r \sum_{s'} \left[\Gamma(s,a,s') * \max _{a'} Q(s',a') \right] $$
Continuation value for state s with action a
$$ C(s,a)=r\sum_{s'} \left[ \Gamma(s,a,s')*\max _{a'} (R(s',a')+C(c',a')) \right] $$
- The relation between Bellman Equation
Get V(s) from Q(s,a) and C(s,a)
$$ V(s) =\max_{a} Q(s,a) $$
$$ V(s) =\max_{a} \left[ R(s,a) + Q(s,a) \right] $$
Get Q(s,a) from V(s) and C(s,a)
$$ Q(s,a) = R(s,a) + r\sum_{s'} \left[ \Gamma(s,a,s')V(s') \right] $$
$$ Q(s,a) = R(s,a) + C(s,a) $$
Get C(s,a) from V(s) and C(s,a)
$$ C(s,a) = r\sum_{s'} \left[ \Gamma(s,a,s') V(s') \right] $$
$$ C(s,a) = r\sum_{s'} \left[ \Gamma(s,a,s') \max _{a'} Q(s',a') \right] $$