Member-only story

Automated Testing: A Software Engineering Concept Data Scientists Must Know To Succeed

How to test your code, and why it matters in data science and your career.

18 min readSep 6, 2025

Why you should read this article

Most data scientists whip up a Jupyter Notebook, play around in some cells, and then maintain entire data processing and model training pipelines in the same notebook.

The code is tested once when the notebook was first written, and then it is neglected for some undetermined amount of time — days, weeks, months, years, until:

The outputs of the notebook need to be rerun to re-generate outputs that were lost.
The notebook needs to be rerun with different parameters to retrain a model.
Something needed to be changed upstream, and the notebook needs to be rerun to refresh downstream datasets.

Many of you will have felt shivers down your spine reading this…

Why?

Because you instinctively know that this notebook is never going to run.

You know it in your bones the code in that notebook will need to be debugged for hours at best, re-written from scratch at worst.

Data Science Collective

Automated Testing: A Software Engineering Concept Data Scientists Must Know To Succeed

How to test your code, and why it matters in data science and your career.

Why you should read this article

Why?

Published in Data Science Collective

Written by Benjamin Lee