Member-only story
Getting started with F1 statistics and Python
Data preparation in Python for the analysis of F1 statistics with the Ergast dataset.
This tutorial describes how to use historic Formula One data for analysis. It covers obtaining the data, cleaning the data and two first analyses made with this data (more will follow!). The main focus in this article is the data preparation of this data set for analysis. It may feel as the dirty work, but good data preparation pays itself back. Easy.
The data is retrieved from the Ergast Developer API. This is an API providing historical data on F1 races, starting in 1950, though not all data is complete. Data is available up to the current season, containing all planned races and results for all completed races.
The available data contains the following table:
- Drivers — Information on all current and previous drivers
- Constructors — Information on all current and previous constructors
- Race results, both constructor and driver
- Qualifying results — Results of all qualifying sessions, including the seperate Q1, A2 and Q3 sessions.
- Lap times — Lap times of all completed laps by all drivers in all events
- Pit stops — All pit stops made, when and duration (pit in — pit out)

