Member-only story
Automated Testing: A Software Engineering Concept Data Scientists Must Know To Succeed
How to test your code, and why it matters in data science and your career.
Why you should read this article
Most data scientists whip up a Jupyter Notebook, play around in some cells, and then maintain entire data processing and model training pipelines in the same notebook.
The code is tested once when the notebook was first written, and then it is neglected for some undetermined amount of time — days, weeks, months, years, until:
- The outputs of the notebook need to be rerun to re-generate outputs that were lost.
- The notebook needs to be rerun with different parameters to retrain a model.
- Something needed to be changed upstream, and the notebook needs to be rerun to refresh downstream datasets.
Many of you will have felt shivers down your spine reading this…
Why?
Because you instinctively know that this notebook is never going to run.
You know it in your bones the code in that notebook will need to be debugged for hours at best, re-written from scratch at worst.

