Convergence

To analyze how to guarantee Vs(s) convergence with TD(k) update

4.1 Bellman Equations
For sequence <$S_{t-1}$, $r_{t}$, $S_{t}$>, We can reason out different TD(k) rules.

  • No action Format: $k=0$, $TD(0)$ rule
    $$ V(s) = R(s) + r * \sum_{s'}\Gamma(s, s')*V(s') $$
    We have $V_t(S_{t-1})$ update rule as follow:
    $$
    \begin{array}
    \mathcal{V_t(S_{t-1})} =
    \begin{cases}
    V_{t-1}(S_{t-1}) + \alpha_t\left[r_t + rV_{t-1}(S_t) - V_t(S_{t-1}) \right] & \\
    V_{t-1}(S_{t-1}) & \text{converge case}
    \end{cases}
    \end{array}
    $$