What Is Data Science?
And what’s all the hype about?
I naturally receive a steady stream of (technical) data science questions from my blogs and YouTube videos. Recently, however, I’ve been getting questions like: how do I get started in data science? To answer this question, I will first define what we are talking about by giving my take on what is data science? This will lead us into the next blog of this series which will discuss how to get started in data science?
Disclaimer
It is worth emphasizing that data science is a massive field. So, there is certainly a variation of what it means and how it looks across organizations. Nevertheless, here I give my impression of the space as someone who has worked in data science for the past few years in academic and business settings.
The Big Picture
Before diving into the specifics of data science, I find it helpful to think of the wider context in which data science typically operates. To give it a name, we can call this larger context the “data space”, but I’m sure others will call it by alternative names (some may even call it all data science).
I see it as there are typically 5 “distinct” roles in this space. From this view, we can see the importance of each role in going from 0 to making real-world impact with data.
Each role serves a larger data pipeline starting from data engineering and concluding with ML operations. While, again, these roles may operate under different labels across organizations, the way I think of them are as follows:
- Data Engineering — Transform real-world data into something better suited for analysis (e.g. ETL, data infrastructure)
- Data Analytics — Tell stories with data (i.e. data visualization, dashboards, summary statistics, EDA, and presentations — not so much programming)
- Data Science — Tell stories with data and build models (i.e. data analytics + more programming and model development)
- ML Operations/Engineering — Deploy models into the real-world (i.e. real-world model implementation)
- Data Management — Keep track of data about data (i.e. organizing databases and metadata)
Caveat
Although I describe these 5 roles as “distinct”, it is rare to find situations where these roles are truly separate and specialized. In practice, there is almost always some overlap (or even lumping all these roles into a single job). However, the value of this view is that it provides a mental model we can use to structure data-focused projects and teams.
What’s the difference between Data Analytics and Data Science?
Admittedly, the brief descriptions I gave for data analytics and data science seem to be redundant. This partially highlights the point that these roles often have overlap. However, key differences exist between what I am calling a data analyst and a data scientist.
From a practical perspective, there are typically 2 distinctions that separate the two roles.
The first distinction is a data science role typically requires much more programming than a data analytics role. For example, data scientists typically use tools like: Python (general), R (statisticians), and MATLAB (science & academia), while data analysts generally interface with tools such as: Excel, Tableau, Power BI, etc.
The second distinction is that data scientists typically focus on building models. This goes beyond early-stage tasks such as data visualization and EDA. Models are a central part of data science. Put simply, a model is something that lets you do predictions. It translates something you know (e.g. yesterday’s sales) into something you don’t know (e.g. tomorrow’s sales).
What’s all the hype about?
There’s no doubt that data science is a hot field these days. Like many things that go viral, it can be difficult to distinguish the value from the hype. In this case, and in my (biased) opinion, the hype is real. Data science's value and potential upside warrant the attention and investment it receives.
I see data science as a universal toolbox for understanding and solving real-world problems. There is potential to use data science to understand and solve problems in new ways wherever there is data. With rapidly growing data volumes across all industries, using data in everyday decision-making will only become more common and necessary.
That said, all this promise means nothing if practitioners lack a solid foundation for interrogating data. That is why competent data scientists are necessary. I’d go further and say we need many competent data scientists in order to keep up with growing data volumes and our endless supply of difficult problems.
Toward this end, in the following post I will share my tips for anyone trying to get into data science. While there are many roads to Babylon (as the saying goes), these tips are a synthesis of my experience in the field.
Free (live) workshops and interviews for data/AI entrepreneurs 👇

