Reinforcement Learning Diffusion Model via RL Training Diffusion Models with Reinforcement Learning tries to apply HFRL to diffusion model, aiming at better generation of high quality images. The Noising and denosing processes in Diffiusion Modeal, assumes Markov propertiy and the final latent distribution to learn $ p_{\theta} (x_{t-1} | x_t) $ The loss function for Diffusion
Reinforcement Learning Multiplex Thinking vs Maximum Likelihood Reinforcement Learning (MLRL) How two 2025 reasoning training paradigms independently rediscovered “optimize search success instead of single‑trajectory accuracy”. TL;DR Both papers try to train language models for Pass@K / best‑of‑N decoding success rather than single‑sample correctness — but they approach the problem from completely different directions: Multiplex Thinking improves
Reinforcement Learning Temporal-Difference Learning Temporal-Difference Learning in Reinforcement Learning, like what you love, TD(0), TD(1), TD(lambda).
Reinforcement Learning Basic model of Reinforcement Learning Basic model for Reinforcement learning, about value function V(s), Q(s, a) function and continuation value function C(s, a)