Reinforcement Learning - Afternoon with Green Tea

Sign in

Reinforcement Learning

Reinforcement Learning in machine learning category

Reinforcement Learning

Diffusion Model via RL

Training Diffusion Models with Reinforcement Learning tries to apply HFRL to diffusion model, aiming at better generation of high quality images. The Noising and denosing processes in Diffiusion Modeal, assumes Markov propertiy and the final latent distribution to learn $ p_{\theta} (x_{t-1} | x_t) $ The loss function for Diffusion

Reinforcement Learning

Multiplex Thinking vs Maximum Likelihood Reinforcement Learning (MLRL)

How two 2025 reasoning training paradigms independently rediscovered “optimize search success instead of single‑trajectory accuracy”. TL;DR Both papers try to train language models for Pass@K / best‑of‑N decoding success rather than single‑sample correctness — but they approach the problem from completely different directions: Multiplex Thinking improves

Reinforcement Learning

Convergence

To analyze how to guarantee Vs(s) convergence with TD(k) update

Reinforcement Learning

Temporal-Difference Learning

Temporal-Difference Learning in Reinforcement Learning, like what you love, TD(0), TD(1), TD(lambda).

Reinforcement Learning

Basic model of Reinforcement Learning

Basic model for Reinforcement learning, about value function V(s), Q(s, a) function and continuation value function C(s, a)