Sitemap
Data Science Collective

Advice, insights, and ideas from the Medium data science community

Press enter or click to view image in full size

Member-only story

Automated Testing: A Software Engineering Concept Data Scientists Must Know To Succeed

How to test your code, and why it matters in data science and your career.

18 min readSep 6, 2025

--

Why you should read this article

Most data scientists whip up a Jupyter Notebook, play around in some cells, and then maintain entire data processing and model training pipelines in the same notebook.

The code is tested once when the notebook was first written, and then it is neglected for some undetermined amount of time — days, weeks, months, years, until:

  • The outputs of the notebook need to be rerun to re-generate outputs that were lost.
  • The notebook needs to be rerun with different parameters to retrain a model.
  • Something needed to be changed upstream, and the notebook needs to be rerun to refresh downstream datasets.

Many of you will have felt shivers down your spine reading this…

Why?

Because you instinctively know that this notebook is never going to run.

You know it in your bones the code in that notebook will need to be debugged for hours at best, re-written from scratch at worst.

--

--

Data Science Collective
Data Science Collective

Published in Data Science Collective

Advice, insights, and ideas from the Medium data science community

Benjamin Lee
Benjamin Lee

Written by Benjamin Lee

Data Scientist in Financial Crime and Anti-Money Laundering