Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Temporal-Difference Learning: Combining Dynamic Programming and Monte Carlo Methods for Reinforcement Learning

Milestones of RL: Q-Learning and Double Q-Learning

15 min readOct 17, 2024

--

We continue our deep dive of Sutton’s book “Reinforcement Learning: An Introduction” [1], and in this post introduce Temporal-Difference (TD) Learning, which is Chapter 6 of said work.

TD learning can be viewed as a combination of Dynamic Programming (DP) and Monte Carlo (MC) methods, which we introduced in the previous two posts, and marks an important milestone in the field of Reinforcement Learning (RL) — combining the strength of aforementioned methods: TD learning does not need a model and learns from experience alone, similar to MC, but also “bootstraps” — uses previously established estimates — similar to DP.

Press enter or click to view image in full size
Photo by Brooke Campbell on Unsplash

Here, we will introduce this family of methods, both from a theoretical standpoint but also showing relevant practical algorithms, such as Q-learning — accompanied with Python code. As usual, all code can be found on GitHub.

We begin with an introduction and motivation, and then start with the prediction problem — similar to the previous posts. Then, we dive deeper in the theory and discuss which solution TD learning finds. Following that, we move to the control problem, and present a…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Oliver S
Oliver S

Written by Oliver S

PhD in ML, working as research / software engineer