Sitemap
Data Science Collective

Advice, insights, and ideas from the Medium data science community

Member-only story

Planning and Learning in Reinforcement Learning

Dissecting “Reinforcement Learning” by Richard S. Sutton with Custom Python Implementations, Episode VI

20 min readFeb 11, 2025

--

In this series, we have explored fundamental techniques and key concepts in Reinforcement Learning (RL). We began with Dynamic Programming (DP), then moved on to Monte Carlo (MC) and Temporal Difference (TD) methods. In the previous post, we introduced a unifying framework for MC and TD methods, resulting in TD-n, which allows for a smooth transition between these two extremes.

Following Chapter 8 of Sutton’s book [1], this post aims to achieve a similar unification — this time, bridging model-based and model-free methods. As the name suggests, model-based approaches require a model of the environment — DP is a prime example. In contrast, model-free methods learn purely from experience, without requiring an explicit model; MC and TD methods fall into this category.

In RL terminology, model-based methods are often associated with planning, while model-free methods are described as learning. However, both share significant similarities, such as learning value functions. In this post, we will explore these connections in greater detail and introduce a unified perspective on planning and learning.

Press enter or click to view image in full size
Photo by Google DeepMind on Unsplash

--

--

Data Science Collective
Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Oliver S
Oliver S

Written by Oliver S

PhD in ML, working as research / software engineer