Reinforcement Learning

Diffusion Model via RL

Orlando Ding

14 Feb 2026 • 2 min read

Training Diffusion Models with Reinforcement Learning tries to apply HFRL to diffusion model, aiming at better generation of high quality images.

The Noising and denosing processes in Diffiusion Modeal, assumes Markov propertiy and the final latent distribution to learn

$ p_{\theta} (x_{t-1} | x_t) $

The loss function for Diffusion Model

With clear definition of diffusion models, the question comes: how to leverage human feedback in this process by incorporating with RL.

DDPO process using binded scope of policy update

Image quality based on human criteria: compress, asehteic, alignment with human prompt.

Reward function for difusion model: using BERTScore as reward function to guarantee the similarity between input prompt and LLaVA's output textural description.

Paper:
Black, Kevin, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. "Training diffusion models with reinforcement learning." arXiv:2305.13301 (May 22, 2023). https://arxiv.org/abs/2305.13301

Blog of Carper.ai, Sept 27, 2023, https://carper.ai/enhancing-diffusion-models-with-reinforcement-learning/
Code: https://github.com/carperai/drlx

UNet generation: https://colab.research.google.com/drive/1EeumZtGdCIPmQPzH0MsU9ttWBaQCi-9s?usp=sharing

Gaussian Error Linear Units:

https://paperswithcode.com/method/gelu#:~:text=The%20Gaussian%20Error%20Linear%20Unit,x%201%20x%20%3E%200%20).

meetup event:

https://www.meetup.com/aifrontiers/events/296391910/

Sign up for more