Member-only story
Planning and Learning in Reinforcement Learning
Dissecting “Reinforcement Learning” by Richard S. Sutton with Custom Python Implementations, Episode VI
In this series, we have explored fundamental techniques and key concepts in Reinforcement Learning (RL). We began with Dynamic Programming (DP), then moved on to Monte Carlo (MC) and Temporal Difference (TD) methods. In the previous post, we introduced a unifying framework for MC and TD methods, resulting in TD-n, which allows for a smooth transition between these two extremes.
Following Chapter 8 of Sutton’s book [1], this post aims to achieve a similar unification — this time, bridging model-based and model-free methods. As the name suggests, model-based approaches require a model of the environment — DP is a prime example. In contrast, model-free methods learn purely from experience, without requiring an explicit model; MC and TD methods fall into this category.
In RL terminology, model-based methods are often associated with planning, while model-free methods are described as learning. However, both share significant similarities, such as learning value functions. In this post, we will explore these connections in greater detail and introduce a unified perspective on planning and learning.

