Basic model of Reinforcement Learning

Basic model for Reinforcement learning, about value function V(s), Q(s, a) function and continuation value function C(s, a)

  1. Bellman equation in 3 different representations

Value for state s
$$ V(s)=\max_{a} \left[ R(s,a)+r\sum _{s'}\Gamma(s,a,s')V(s') \right] $$

Value for state s with action a
$$ Q(s,a)=R(s,a)+r \sum_{s'} \left[\Gamma(s,a,s') * \max _{a'} Q(s',a') \right] $$

Continuation value for state s with action a
$$ C(s,a)=r\sum_{s'} \left[ \Gamma(s,a,s')*\max _{a'} (R(s',a')+C(c',a')) \right] $$

  1. The relation between Bellman Equation

Get V(s) from Q(s,a) and C(s,a)
$$ V(s) =\max_{a} Q(s,a) $$
$$ V(s) =\max_{a} \left[ R(s,a) + Q(s,a) \right] $$

Get Q(s,a) from V(s) and C(s,a)
$$ Q(s,a) = R(s,a) + r\sum_{s'} \left[ \Gamma(s,a,s')V(s') \right] $$
$$ Q(s,a) = R(s,a) + C(s,a) $$

Get C(s,a) from V(s) and C(s,a)
$$ C(s,a) = r\sum_{s'} \left[ \Gamma(s,a,s') V(s') \right] $$
$$ C(s,a) = r\sum_{s'} \left[ \Gamma(s,a,s') \max _{a'} Q(s',a') \right] $$