Member-only story

Temporal-Difference Learning: Combining Dynamic Programming and Monte Carlo Methods for Reinforcement Learning

Milestones of RL: Q-Learning and Double Q-Learning

15 min readOct 17, 2024

We continue our deep dive of Sutton’s book “Reinforcement Learning: An Introduction” [1], and in this post introduce Temporal-Difference (TD) Learning, which is Chapter 6 of said work.

TD learning can be viewed as a combination of Dynamic Programming (DP) and Monte Carlo (MC) methods, which we introduced in the previous two posts, and marks an important milestone in the field of Reinforcement Learning (RL) — combining the strength of aforementioned methods: TD learning does not need a model and learns from experience alone, similar to MC, but also “bootstraps” — uses previously established estimates — similar to DP.

Here, we will introduce this family of methods, both from a theoretical standpoint but also showing relevant practical algorithms, such as Q-learning — accompanied with Python code. As usual, all code can be found on GitHub.

We begin with an introduction and motivation, and then start with the prediction problem — similar to the previous posts. Then, we dive deeper in the theory and discuss which solution TD learning finds. Following that, we move to the control problem, and present a…

TDS Archive

Temporal-Difference Learning: Combining Dynamic Programming and Monte Carlo Methods for Reinforcement Learning

Milestones of RL: Q-Learning and Double Q-Learning

Published in TDS Archive

Written by Oliver S